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Coalescents with multiple collisions, also known as A-coalescents, 
were introduced by Pitman and Sagitov in 1999. These processes de- 
scribe the evolution of particles that undergo stochastic coagulation 
in such a way that several blocks can merge at the same time to form 
a single block. In the case that the measure A is the Beta(2 — a, a) 
distribution, they are also known to describe the genealogies of large 
populations where a single individual can produce a large number of 
offspring. Here, we use a recent result of Birkner et al. to prove that 
Beta-coalescents can be embedded in continuous stable random trees, 
about which much is known due to the recent progress of Duquesne 
and Le Gall. Our proof is based on a construction of the Donnelly- 
Kurtz lookdown process using continuous random trees, which is of 
independent interest. This produces a number of results concerning 
the small-time behavior of Beta-coalescents. Most notably, we recover 
an almost sure limit theorem of the present authors for the number of 
blocks at small times and give the multifractal spectrum correspond- 
ing to the emergence of blocks with atypical size. Also, we are able 
to find exact asymptotics for sampling formulae corresponding to the 
site frequency spectrum and the allele frequency spectrum associated 
with mutations in the context of population genetics. 

1. Introduction and preliminaries. Consider the following simple popu- 
lation model. Assume that the size of the population stays constant, equal 
to a fixed integer n > 1, where individuals are numbered l,...,n. In this 
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population, each individual reproduces at rate (n — l)/2. When individual 
i reproduces, she gives birth to two children. One of them is again called 
individual i and the other replaces individual j for a randomly chosen label 
j ^ i with 1 < j <n. If t > is a fixed time, we may define an ancestral 
partition (n*,0 < s <t) for this population model by saying that i and j 
are in the same block of IT* if and only if the corresponding individuals at 
time t have the same ancestor at time t — s. It is elementary to check that 
the dynamics of the process (n*,0 < s <t) are governed by the rules of a 
process called Kingman^ s coalescent. This is a Markov process characterized 
by the fact that the only transitions are those where pairs of blocks merge 
and any given pair of blocks merges at rate 1 independently of everything 
else. In fact, even for more realistic population models, it is often the case 
that the genealogy of a small sample of a population may be effectively de- 
scribed by Kingman's coalescent; the introduction of this tool by Kingman 
[33, 34] was a major development in population genetics. One of the great 
advantages of this theory is that it is well adapted to the statistical analysis 
of molecular population samples since, for instance, in this framework, one 
can deal with a population sample rather than the population as a whole. 
Moreover, molecular and genetic data convey much information about an- 
cestral relationships in a population sample. Much background material on 
the use of coalescent models in the field of population genetics can be found 
in the recent book [29] or in the review paper [25]. 

However, recent work (see, e.g., [21, 40, 49, 51]) has shown that Kingman's 
coalescent is not very well suited when we deal with populations where 
individuals may give birth to a large number of offspring or when we consider 
the genealogy of a population affected by repeated beneficial mutations [20]. 
In these cases, it is more appropriate to model the merging of ancestral 
lines by coalescent processes that allow multiple collisions, that is, several 
blocks may merge at once, although only one of those events may occur at a 
given time. These processes, called A-coalescents, have been introduced and 
studied by Pitman [45] and Sagitov [49]. As shown by Pitman [45], they are 
Markov processes in which any given number of blocks may merge at once 
and are characterized by a finite measure A on [0, 1]. The A-coalescent has 
the property that whenever the process has b blocks, any given /c-tuple of 
blocks merges at a rate given by 



see the next section for a more precise definition. For instance, Schweins- 
berg [51] showed that A-coalescents arise as the rescaled genealogies of some 
population models where individual offspring distributions have infinite vari- 
ance. More precisely, let 1 < a < 2 and let X be a random variable such that 
P{X > A;) ~ Ck~" for some C > 0. Consider the following population model. 




BETA-COALESCENTS AND CONTINUOUS STABLE RANDOM TREES 



3 



As before, the size of the population is kept constant, equal to n. The model 
is formulated in discrete time. At each generation, each individual produces a 
random number of offspring (distributed like X) independently of other indi- 
viduals and of the past. Then n of them are randomly chosen to survive and 
the others are discarded. One of the main results of [51] is that the ancestral 
partitions, suitably rescaled, converge to the Beta(2 — a, Q;)-coalescent, that 
is, a A-coalescent such that the measure A is the Beta(2 — a, a) distribution. 

This connection with population genetics has served both as a motiva- 
tion for studying these processes and also as a source of inspiration for a 
rich theory that is only now starting to emerge, starting with the series of 
seminal papers by Bertoin and Le Gall [10, 11, 12, 13]. In these papers, 
A-coalescents are obtained as duals of measure- valued processes called gen- 
eralized Fleming-Viot processes. In simple cases (viz., the cases of quadratic 
branching and stable branching mechanisms), these processes describe the 
composition of a population {Zt,t>0) undergoing continuous branching 
(i.e., Z is a continuous-state branching process, or CSBP for short; defini- 
tions will be given below). This stream of ideas has led Birkner et al. [14] 
to prove that one can obtain Beta-coalescents by suitably time-changing 
the ancestral partitions associated with the genealogy of {Zt,t > 0). In this 
continuous context, it is technically nontrivial to make rigorous sense of the 
notion of genealogy, but this is achieved through the use of a process called 
the {modified) lookdown process associated with {Zt,t > 0), a powerful tool 
introduced by Donnelly and Kurtz [16]. 

In parallel, it has been known for some time that CSBPs can be viewed 
as local time processes of a process {Ht,t < T^-) called the height process, 
in a way that is analogous to the classical theorem of Ray and Knight for 
Brownian motion relating the Feller diffusion, the solution of 



where (Wj)t>o is Brownian motion, to the local times of a reflecting Brow- 
nian motion. This connection has been formalized by Le Gall and Le Jan 
[36]. The height process itself encodes a continuous random tree, analogous 
to the Brownian tree of Aldous [1, 2], and can be viewed as the scaling limit 
of suitably normalized Galton- Watson trees. A careful exposition of this rich 
theory can be found in [17]. 

In this paper, we have two main goals. The first is to describe another 
way of thinking about the genealogy of a Beta-coalescent. This is achieved 
by embedding a Beta-coalescent into a continuous random tree with stable 
branching mechanism. To prove this result, we show that one can obtain 
the Donnelly-Kurtz lookdown process from a continuous random tree in 
a very simple fashion. This is valid for a general (sub)critical branching 
mechanism and is of independent interest. From this, and careful analysis, 
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it follows that the coalescent tree can be thought of as what is perhaps 
the simplest genealogical model: a Galton-Watson tree with a continuous 
time parameter. Our second goal is to use this connection to discuss results 
about the small-time behavior of Beta-coalescents and related processes. 
This study was initiated in [8] without the help of continuous random trees. 
In particular, we apply these ideas to a problem of interest in population 
genetics. 

Organization of the paper. After recalling the necessary definitions and 
results about coalescent processes, CSBPs and continuous random trees in 
Section 2, we state our results in Section 3. In Section 4, we explain our 
construction of the Donnelly-Kurtz lookdown process from a continuous 
random tree. In Section 5, we prove our results related to the small-time be- 
havior of Beta-coalescents, giving asymptotics for the number of blocks and 
the multifractal spectrum. Finally, results concerning biological applications 
are proved in Section 6. 

2. Preliminaries. 

2.1. The K- coalescent. Let Vn denote the set of all partitions of the set 
{1, . . . ,n} and V denote the set of all partitions of N = {1,2, . . .} (in this 
paper, it is always assumed that the set N does not contain 0). It turns 
out that the simplest way to define a coalescent process is by looking at a 
version of this process taking its values in the space V. For all partitions 
TT gV, let RnTT be the restriction of vr to {1, . . . , n}, meaning that RnTT € Vn 
and that two integers i and j are in the same block of Rni^ if and only 
if they are in the same block of tt. A A-coalescent (or a coalescent with 
multiple collisions) is a T'-valued Markov process {ll(t),t>0) such that, 
for all n € N, the process {Rnli{t),t > 0) is a P„-valued Markov chain with 
the property that whenever i?„n(t) has b blocks, any particular /c-tuple of 
blocks of this partition merges at a rate equal to Xb^k, these being the only 
possible transitions. The rates X^^k depend neither on n nor on the numbers 
of integers in the b blocks. Pitman [45] showed that the transition rates must 
satisfy 



for some finite measure A on [0, 1] . The laws of the processes Rn^ are con- 
sistent and this allows one to consider a process 11 such that the restriction 
RrJJ has the above description. A coalescent process such that (1) holds for 
a particular measure A is called the A-coalescent. 

To better understand the role of the measure A, it is useful to have in 
mind the following Poissonian construction of a A-coalescent, also due to 



(1) 
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Pitman [45]. Suppose A does not put any mass on {0}. Let (tj,Xi)jg/ be the 
atoms of a Poisson point process on M"*" x [0, 1] with intensity measure dt (8> 
x~'^A{dx). Observe that although A is a finite measure, x~'^A[dx) is not finite 
in general, but only sigma-finite. Hence, (ti,Xi)i^j may have countably many 
atoms on any time interval [ii,t2]) so in order to make rigorous sense of the 
following description, one should again work with restrictions to {1, . . . ,n}. 
The coalescent only evolves at times t such that t = ti for some i € I. For 
each cluster present at time , we flip an independent coin with probability 
of heads Xi, where {ti,Xi) is the corresponding atom of the point process. 
We merge all the clusters for which the coin came up heads and do nothing 
with the other clusters. Hence, we see that in a A-coalescent where A has no 
mass at 0, x~'^A(dx) is the rate at which a proportion x of the blocks merge 
(such an event is generally called an x-merger). On the other hand, when 
A is a unit mass at zero, each transition involves the merger of exactly two 
blocks and each such transition occurs at rate 1, so this is just Kingman's 
coalescent. 

Kingman's theory of exchangeable partitions provides us with a way of 
looking at this process as taking its values in the space S = {xi > X2 > ■ ■ ■ > 
OjX^i^i^i — 1}) which is perhaps a bit more intuitive since the notion of 
mass is apparent in this context. The resulting process is called the ranked 
A-coalescent. Briefly, partitions of N defined by the above procedure are 
exchangeable, so this implies that for each block of the partition, there exists 
a well-defined number called the frequency or mass of the block, which is the 
almost sure limiting proportion of integers in this block. Therefore, given a 
measure A and a A-coalescent H = (Ilt,t > 0), one can define a process 
X = {X{t),t > 0) with values in the space S by taking for each t > the 
frequencies of n(t) ranked in decreasing order. When S is endowed with the 
topology that it inherits from £^ , the law at time t of this process Qt defines a 
Markov semigroup with an entrance law: the process enters at time 0'*' from 
a state called dust, that is, the largest frequency vanishes as t ^ . These 
technical points are carefully explained in the original paper of Pitman [45], 
Theorem 8. The process X is said to have proper frequencies if J2i^i = 
1 for all t > 0. Pitman has shown that this is equivalent to /q x~^A{dx) < oo. 
This is also equivalent to the fact that almost surely n(t) does not contain 
any singleton, or that all blocks are infinite. Another notion which plays an 
important role in this theory is that of coming down from infinity. Pitman 
[45] has shown that only two situations occur, depending on the measure A. 
Let E be the event that for alH > 0, there are infinitely many blocks and let 
F be the event that for all t > 0, there are only finitely many blocks. Then, 
if A({1}) = 0, either P{E) = 1 or P{F) = 1. When P{F) = 1, the process X 
or n is said to come down from infinity. For instance, Kingman's coalescent 
comes down from infinity, while if A{dx) = dx is the uniform measure on 
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(0,1), the A coalescent does not come down from infinity. This particular 
choice of A corresponds to the so-called Bolthausen-Sznitman coalescent 
which first arose in connection with spin glasses [15]. For a necessary and 
sufficient condition on A for coming down from infinity, see [13, 50] and the 
forthcoming [7]. Note also that a coalescent that comes down from infinity 
must have proper frequencies. 

In this paper, we will be concerned with the one-parameter family of co- 
alescent processes called Beta-coalescents. These are the A-coalescent pro- 
cess obtained when the measure A is the Beta(2 — a, a) distribution with 
1 < a < 2, 



The reason we restrict our attention to 1 < a < 2 is that this corresponds 
to the case where the coalescent process comes down from infinity (a conse- 
quence of Schweinsberg's [50] criterion). When a=l, the Beta(l, 1) distri- 
bution is simply the uniform distribution on (0, 1), so this the Boltahusen- 
Sznitman coalescent, which stays infinite. When a ^ 2, it can be checked 
that the Beta(2 — a, a) distribution converges weakly to the unit mass at 
zero, so, formally, the case a = 2 corresponds to Kingman's coalescent. This 
family of processes enjoys some remarkable properties, as can be seen from 
[14, 51] and results in the present work. This partly reflects the fact that the 
continuous-state branching processes with stable branching mechanism, with 
which they are associated (see below), enjoy some strong scale-invariance 
properties, just like Brownian motion. 

2.2. Continuous-state branching processes. Continuous-state branching 
processes have been introduced and studied by, among others, Lamperti 
[35] and Grey [27]. They are Markov processes {Zt,t > 0) taking their values 
in [0, oo] and we think of Zf >0 as the size of a continuous population at 
time t. Continuous-state branching processes are the continuous analogues 
of Gallon- Watson processes as well as their scaling limits. They are charac- 
terized by the following branching property: if pt{x, •) denotes the transition 
probabilities of Z started with Zq = x, then for all x, y E R+, 



which means that the process started from x + y individuals has the same law 
as the sum of a process started from x and one started from y independently. 
The interpretation of (2) is that if individuals live and reproduce indepen- 
dently, then a population started from x + y individuals should evolve as the 
sum of two independent populations, one started with x individuals and one 
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with y individuals. Lamperti [35] has shown that a continuous-state branch- 
ing process is characterized by a function ij: : [0, oo) — > M cahed the branching 
mechanism, such that for ah t > 0, the Laplace transform of Zt satisfies 

(3) ^[e-^^'|Zo = a] = e-'^"*(^), 
where the function Ut{\) solves the differential equation 

(4) ^^ = -V(nt(A)), no(A) = A. 

Moreover, the branching mechanism ij: is the Laplace exponent of some 
spectrally positive Levy process (i.e., Levy process with no negative jumps). 
That is, there exists a measure v on (0, oo) and some numbers a € M and 
6 > such that for all > 0, 

;>oo 

(5) i^{q)=aq + hq^ + (e"^^ - 1 + gxl|^.<l|)z^((ix) 

and /o°°(l /\x^)v{dx) < oo. Furthermore, if {Yt,t > 0) is the Levy process 
with Laplace exponent ■0, that is, 

then the distributions of {Zt^t > 0) and {Yt,t> 0) are related by a simple 
time-change (sometimes called the Lamperti transform). Let 



Ut= ['y-Us, 

Jo 



where {Yt,t > 0) is the process {Yt,t > 0) stopped when it first hits zero, and 
call Uf~^ the inverse cadlag of Ut- {Y^~i,t > 0) then has the same law as Z. 
We refer the reader to, for instance, [9] for more information about this. 

When 'ip{q) = for some a G (0,2], we say that the CSBP has a stable 
branching mechanism. When a = 2, the process Z is Feller's diffusion and 
the Levy process in Lamperti's transformation is standard Brownian motion. 
When 1 < a < 2, this branching mechanism arises by taking a = b = and 

L [2 — a) 

in (5). The Levy process in Lamperti's transformation is an a-stable Levy 
process having the scaling property 

{Yxt,t>0) =d{X'/''Yt,t>0). 
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2.3. The height process and continuous random trees. Le Gall and Le 
Jan [36] have introduced a new way of thinking about CSBPs, which was 
further carefully explored by Duquesne and Le Gall in [17]. It is inspired 
by the well-known result of Ray and Knight on the local times of Brownian 
motion and is related to the construction of the Brownian continuum random 
tree of Aldous [1, 2] . Recall that if i? is a reflecting Brownian motion, {Lf, t > 
0,x > 0) is a jointly continuous version of its local times and {Tr,r > 0) is 
the cadlag inverse of L^, then for fixed r > 0, the process (Lj.^,x > 0) is a 
Feller diffusion started with initial population r. Le Gall and Le Jan have 
introduced a process {Ht,t > 0) which generalizes the Ray-Knight theorem 
to continuous branching process with (sub)critical branching mechanism. 

More precisely, consider a Laplace exponent ipiq) and a t/^-CSBP {Zt,t > 
0). We will assume that ip is subcritical, that is, a.s. there exists some time 
< r < cx) such that Z^- = 0- Grey has shown that this is equivalent to the 
condition that the branching mechanism ip satisfies 



In particular, this is the case when ip{q) = q'^/2 or when ip{q) = for 
1 < a < 2. Lamperti [35] has shown that there exists a sequence of offspring 
distributions such that if we consider {ZJ^,k = 1, 2, . . .), a discrete Galton- 
Watson process with offspring distribution fin and started with n individu- 
als, then >0) converges in the sense of finite-dimensional distri- 
butions to {Zt,t > 0), where the 7„ are suitable time-scaling constants. If we 
ask for finer limit theorems about the genealogy of {Zt,t > 0), then Duquesne 
and Le Gall have shown that the discrete height process {HJ!, fc = 0, 1, . . .), 
where HJ^ is the generation of the kth individual, converges when suitably 
normalized to a process {Ht,t > 0) called the height process. One may di- 
rectly construct this process {Ht,t > 0) from a Levy process with Laplace 
exponent Thus, informally, the height process plays the same role as the 
depth-first search process on a discrete tree, but in a continuous setting. An 
important result of Duquesne and Le Gall [17] is that, even though H is, in 
general, neither a semi-martingale nor a Markov process, that is, H admits 
a local time process, that is, almost surely, there exists a jointly continuous 
process (L", s > 0, a > 0) such that for all t > 0, 



They were also able to prove that the process H has a continuous modifica- 
tion provided the branching mechanism is subcritical. 

The importance of the process H stems primarily from the generalized 
Ray-Knight theorem, which we now state (see [17, 36]). Let = inf{t > 
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Then {Zt, t > 0) is a i/'-CSBP started at Zq = r. If il;{q) = q'^/2, then {Ht, t > 
0) has the law of a reflecting Brownian motion and {Zt,t > 0) is the Feller 
diffusion, as the classical Ray-Knight theorem states. 

3. Main results. 

3.1. The Beta-coalescent in the continuous stable random tree. Our first 
result is the embedding of a Beta(2 — a, a)-coalescent for 1 < a < 2 in the tree 
coded by the a-stable height process. Let Z be an a-stable CSBP obtained in 
the fashion of Duquesne and Le Gall from the height process {Ht, <t <Tr) 
associated with ip{q) = for a given 1 < a < 2, that is, Zt = Lj^^. Consider, 
for all t, the random level 



and let R~^{t) = inf{s : iJ^ > It follows from [14] that R~^{t) < oo a.s. for 
all t and that lim^^oo R~^{t) = C, where C is the lifetime of the CSBP. 

Let {Vi,i = 1,2, .. .) be a sequence of variables in (0, Tf) defined such that 
for all i G N, Vi is the left endpoint of the ith highest excursion of the height 
process H above the level R~^{t). Next, we define a process (H^jO < s <t) 
which takes its values in the space V of partitions of N as follows: 



That is, i and j are in the same block of lis if and only if Vi and Vj are in 
the same excursion of H above level R~^{t — s). 

Theorem 1. The process (n<j,0 < s < t) is a Beta(2 — a,a)-coalescent 
run for time t. 

Another way of looking at this result is to consider the ranked coalescent. 
Let {X{s),0 < s < t) be the process with values in S defined by the following 
procedure. For each s <t, X{s) has as many nonzero coordinates as there 
are excursions of the height process above R~^{t — s) that reach the level 
R~^{t). To each such excursion we associate a mass given by the local time 
of that excursion at level R~^{t), normalized by Z^-i(j) so that the sum 
is equal to 1. Then X{s) is defined as the nonincreasing rearrangement of 
these masses. 

Corollary 2. {X{s),0 < s <t) has the same distribution as the ranked 
Beta(2 — a, a) -coalescent run for time t. 
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Tr 

Fig. 1. A Beta-coalescent is obtained by coalescing excursions of {Ht,t < Tr) above 
R^^{t — s) that reach R^^{t). Thus each excursion corresponds to a block of the coalescent 
and its mass is given by its local time at level R~^{t). 

We picture the coalescent as the fohowing process. As s increases from 
to t, the level R~^{t — s) decreases from R~^(t) to 0. The excursions of H 
above level R~^{t — s) coalesce because if si < S2, then several excursions 
of H above the level R~^(t — si) could be part of the same excursion of H 
above the level R~^{t — 82)- This will happen, for example, if the excursion 
of H above the level R~^{t — si) has a local minimum at the level R~^{t — 
82)- Then, in the corresponding coalescent process, we observe a merging 
of masses at time S2 corresponding to the fraction of local time at R~^(t) 
contained by each of those excursions. 

Remark 3. Recall the definition of an R-tree associated with a nonneg- 
ative function H defined on an interval [0,Tr]. If dH{u,v) = H{u) + H{v) — 
2miu<t<v H{t), then dn is a pseudodistance on [0,Tr]. Equipped with du, 
the quotient of [0,r,.] by the relation dH{u,v) = is an M-tree. For the func- 
tion {Hs,s <:Tr), this gives a Poissonian collection of scaled stable trees 
joined at the root. In this context, the Vi are certain vertices at distance 
R~^{t) from the root and the state of the coalescent at time s can be de- 
scribed as the partition obtained by declaring i ~ j if and only if their most 
recent common ancestor is at distance greater than R~^(t — s) from the root, 
that is, if dH{Vi,Vj) < 2{R-^{t) - R-^{t - s)). In other words, if we define 

a new distance d^i on N by 

dfi>{i,j) = mi{s : R-\t) - R-\t - s) = dniVi, Vj)/2}, 



BETA-COALESCENTS AND CONTINUOUS STABLE RANDOM TREES 11 



then the classes of lis are the balls of radius s for the metric . 

3.2. Small-time behavior and multifractal spectrum. We now use Theo- 
rem 1 to obtain several results about the small-time behavior of the Beta 
coalescents. 

Let N{t) be the number of blocks at time t of a Beta-coalescent n(t). Our 
first application gives the almost sure limit behavior of N[t) and has already 
been shown in [8] using methods based on the analysis of CSBP with stable 
branching mechanisms. 



Theorem 4. 



limt^/("-i)iV(t) = (ar(a))i/("-^) a.s. 



For an exchangeable random partition, the number of blocks is related to 
the typical block size. For instance, suppose IT is an exchangeable random 
partition and that |n| denotes the number of blocks of 11. Using equation 
(2.27) in [46], we see that if Xi is the asymptotic frequency of the block 
of n containing 1, then i?(|n|) = E^X^"^). Hence, here, at least informally, 
we see that the frequency of the block which contains 1 at time t must be 
of the order of l/N{t) cx tV(a-i) (this result was proved rigorously in [8]). 
Put another way, this says that almost all of the fragments emerge from the 
original dust by growing like We say that l/(a — 1) is the typical 

speed of emergence. 

However, some blocks clearly have a different behavior. Consider, for in- 
stance, the largest block and denote by W{t) its frequency at time t. It was 
shown in [8], Proposition 1.6, that 

(ar(a)r(2 - a))'^''^t-^/''W{t) ^dX as t j 0, 

where X has the Frechet distribution of index a. Hence, the size of the 
largest fragment is of the order of t^/". 

This suggests studying the existence of fragments that emerge with an 
atypical rate 7 7^ 1/(q: — 1). To do so, it is convenient to consider a random 
metric space {S,d) which completely encodes the coalescent H (this space 
was introduced by Evans [23] in the case of Kingman's coalescent). The 
space (S*, d) is the completion of the space (N, d), where d{i,j) is the time 
at which the integers i and j coalesce. In particular, completing the space 
{1, 2, . . .} with respect to this distance in particular adds points that belong 
to blocks behaving atypically. In this framework, we are able to associate 
with each point x € S and each t> a positive number r]{x, t) which is equal 
to the frequency of the block at time t corresponding to x. (This is formally 
achieved by endowing S with a mass measure r/.) In this setting, we can 
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reformulate the problem as follows: are there points x £ S such that the block 
Bx{t) that contains x at time t behaves as t'~^ when t ^ or, more formally, 
such that r]{x,t) ^f^l [Here, f{t) x g{t) means that log f{t)/ log g{t) 1.] 
Also, how many such points typically exist? 
We define, for 7 < l/(a — 1), 

5(7) = (x€5:liminfM?(M)<^\ 
I log* - 7 

and, similarly, when 7 > l/(a — 1), 

Q( X \ ^ Q V log(7?(x,t)) 1 

6 (7) = < X G 5 : lim sup > 7 ^ 

When 7 < l/(a — 1), 5(7) is the set of points which correspond to large 
fragments. On the other hand, when 7 > — 1), 'S'(7) is the set of points 
which correspond to small fragments. In the next result, we answer the 
question raised above by computing the Hausdorff dimension (with respect 
to the metric of S) of the set 5(7). 



Theorem 5. 

1. If -<-f< then 

dim-H 5(7) =70 — 1. 

If J < then S{^) = a.s. but S{l/a) ^ almost surely. 

2. //-i_<^<_^, then 

dim>^ S'(7) = " ^ - 1. 

7(a — 1)^ 

If 1 > {a-i)'^ ' ^^^^ '^'(7) = a.s. hut S{ (q ) 7^ ^ almost surely. 



Remark 6. The maximal value of dim7^5'(7) is obtained when 7 = 
1/(q! — 1), in which case the dimension of S{-^) is also equal to l/(a — 1). 
This was to be expected since this is the typical exponent for the size of a 
block. The value of the dimension then corresponds to the full dimension of 
the space 5, as was proved in [8], Theorem 1.7. 

Remark 7. We recover part of Proposition 1.6 in [8] that the largest 
block has size of order t^/°' since this is the smallest 7 for which 5(7) 7^ 0. 
It may be a bit more surprising that there is such a thing as a notion of 
smallest block, whose size is of order , where 7 = 0/(0 — 1)^. 



BETA-COALESCENTS AND CONTINUOUS STABLE RANDOM TREES 13 



Fig. 2. Multifractal spectrum map 7 dim->^ 5(7) . The left-derivative at the critical 
point is a while the right-derivative is —a. 

Remark 8. This is reminiscent of the problem considered in [6], in 
which the long-time asymptotic behavior of homogeneous fragmentations 
was studied. More precisely, it was shown there that if F{t) is a homoge- 
neous fragmentation of the interval (0,1) and Ix{t) denotes the fragment 
that contains x at time t, then there is a typical speed of fragmentation vq, 
in the sense that if U is uniform on (0, 1), then almost surely ~ e~'"°^ . 

However, for v ^ in some range, the random set of exceptional points 
S{v) := {x G (0, 1) : e~^*} is nonempty and has zero Lebesgue mea- 

sure. The main result in [6] gives an explicit expression of the multifractal 
spectrum map v diinfi{S{v)) where dim7-^(5') denotes the Hausdorff di- 
mension of S. However, we emphasize that in [6], this Hausdorff dimension 
is computed with respect to the metric 5 induced by the Lebesgue measure 
on (0, 1). In that case, the fact that the diameter of a block is equal to its 
mass plays a significant role. By contrast, here, we compute dimensions with 
respect to the metric d, which should rather be understood as a genealogical 
distance. 

3.3. Frequency spectra for mutation models. We now describe a result 
concerning Beta-coalescents which has some applications to a question aris- 
ing in population genetics. The question is concerned with the quantification 
of polymorphism in a sample of given size taken from a population. Sup- 
pose we sample n individuals from a population at a certain time. Due to 
mutations, at a given locus, not all individuals in this sample will have the 
same allele. Moreover, mutations also affect different sites. We may ask sev- 
eral questions. In the sample of size n, how many different alleles should we 
observe at a given locus (site)? On how many sites should we expect to see 
different alleles? With which frequency should each of the different alleles be 
represented? As we will see, the answers to these questions depend heavily 
on the nature of the population, particularly on its reproduction mechanism, 
in addition to the mutation rate. 




1/a 



l/(a-l) 



cx/{a-l)^ 
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1 4 3 2 5 6 



Fig. 3. In the infinite sites model, each mark stands for a mutation that affects a different 
locus. In this example, there are four families: {1,4} , {4}, {2,3,5,6} and {2,5,6}. On the 
other hand, in the infinite alleles model, the allelic partition He also has four blocks: {1}, 
{2,5,6}, {3} and {4}. 



To make the problem mathematically tractable, we will consider two sim- 
plified models. The rate at which mutations occur will always be assumed 
to be a positive number 0, constant with time. In the first model, called the 
infinite alleles model, introduced by Kimura and Crow [32] in 1964, we study 
a given locus in the sample and assume that each mutation has resulted in 
a new allele. This means that the descendants of an individual affected by 
a mutation all carry the same allele except those later affected by another 
mutation. In the second model, called the infinite sites model, introduced by 
Kimura [31] in 1969, we look at the number of sites where we expect individ- 
uals to show polymorphism. In this model, we assume that each mutation 
occurs at a new site. In particular, if an individual is affected by a mutation, 
all the descendants of this individual carry this mutation. See Figure 3 for 
an illustration of these two models. 

In the infinite alleles model, one can define the so-called allelic partition. 
That is, one may divide the sample into groups of individuals having the 
same allele at the observed locus. For a sample of size n, quantities of interest 
include the number of different groups (which we will also refer to as clusters 
or also sometimes blocks), N{n), as well as typical sizes of groups: we denote 
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by Nk{n) the number of blocks in the ahehc partition of size k. In the infinite 
sites model, one cannot define a partition of the sample because a given indi- 
vidual in the sample may have been affected by several mutations. However, 
we can still define M(n) to be the total number of mutations and Mfc(n) to be 
the number of these mutations affecting exactly k individuals in the sample. 
For example, in Figure 3, N{n) = 4, Ni{n) = 3,N2{n) = 0,N-i{n) = 1, while 
M(n) = 4 and Mi(n) = M2(n) = M3(n) = M4(n) = 1. The whole sequence 
(Ml (n) , . . . , Mn (n) ) is called the site frequency spectrum and the sequence 
(A'^i(77,), . . . ,Nn{n)) is called the allele frequency spectrum. 

A fundamental result in this domain is the celebrated Ewens sampling 
formula [24]. This result gives an explicit formula for the distribution of 
the allelic partition, under some standard assumptions on the reproduction 
mechanism of the population. The result is perhaps best explained through 
the theory of Kingman's coalescent. Based on this process, Kingman [33] was 
able to find a simpler proof of Ewens' sampling formula. Assume that the 
genealogy of the population may be described by the dynamics of Kingman's 
coalescent, that is, each pair of lineages coalesces at rate 1. Assuming the 
rate of mutations is 9/2 along every lineage, the Ewens sampling formula 
states that the probability that the allelic partition has Oj blocks of size i 
for i = 1, . . . ,n is 

n! " 

(8) P{Ni{n) = ai, . . . , Nnin) = an) = p(ai, . . . , a^) = 

G{n) f=i ^"'fli! 

where = 9{9 + 1) ■ ■ ■ {6 + n — 1). This formula has since played an im- 
portant role in many different areas of probability theory, sometimes fairly 
distant from the original application to population genetics. Among many 
others, we refer the reader to [4] and to [28] for different proofs of (8). 

Unfortunately, the methods used to prove (8) do not seem to apply to the 
more general framework of A-coalescents. In fact, there are very few explicit 
results studying the structure of a sample of the population in this context. 
Let us mention, in particular, the work of Mohle [39], Theorem 3.1, who gets 
a recursive formula for the allele frequency spectrum. However, this may be 
so intricate that this recursion is difficult to use in practice for moderately 
large sample sizes. 

We present here an asymptotic formula for the frequency spectrum, both 
in the infinite alleles and the infinite sites models, as the sample size n oo. 
We work under the convention that the genealogy of the population can be 
described by a A-coalescent (Ilt,t > 0). We focus on the case where the 
measure A is the Beta(2 — a, a) distribution and 1 < a < 2. We assume that 
mutations occur at constant rate 9 > 0. 
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Theorem 9. Assume that A has the Beta(2 — a,a) distribution with 1 < 
a <2. Fix a positive integer k. Then 



and 



k\ 



where denotes convergence in probability as oo. 



Remark 10. To understand where these results come from, recall that 
in Theorem 1.9 of [8], we showed that 



(9) n"-^M(n 



,p , 



a 



In Section 5, we will show that for small times, the Beta(2 — a, a)-coalescent 
can be approximately described by the genealogy of a continuous-time branch- 
ing process in which individuals live for an exponential amount of time with 
mean 1 and then have a number of offspring distributed according to X; 
where P{x = 0) = P{x = 1) = and where, for k>2, we have 

(10) P{x = k) = ^^ilZL^^li^ -a)---{k-l-a) _ aT{k - a) 



k\ klT{2-a) 

This offspring distribution is supercritical with mean 1 -|- l/(a — 1). We will 
show that if r is an independent exponential random variable with mean 
1/c, where c = (2 — a) /{a — 1) > 0, and /c is a positive integer, then 

(11) Mk{n)r~.M{n)P{^r = k), 

where 

This result, and the analogous result for Nk{n), will imply Theorem 9. 



Remark 11. One can only observe Mk{n) from biological data if the an- 
cestral type is known. Otherwise, it is necessary to work with the "wrapped 
frequency spectrum" Mfc(n) = Mfc(n) + M„_fc(n). For fixed k >1, one can 
see from (11) that as n — > oo, these two quantities have the same asymp- 
totics because the limiting values of Mk{n) /M{n) sum to one and therefore 
Mn-k{n)/M{n) goes to zero in probability as n ^ oo. 
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Remark 12. It is natural that the distribution (10) arises in this context 
because when the Beta(2 — a, Q:)-coalescent has b blocks, the probability 
that its next merger involves k blocks converges to P{x = k) as 6 — > oo (see 
[8, 13]). Of course, an individual having k offspring in the Galton- Watson 
process corresponds to a merger of k blocks in the corresponding coalescent 
process going backward in time. 

Remark 13. The limiting behavior described in Theorem 9 also arises 
in the theory of exchangeable partitions. Following Lemma 3.11 in Pitman 
[46], let n be an exchangeable random partition whose ranked asymptotic 
frequencies Pi satisfy 

(12) Pi ~ 

almost surely for some random variable Z such that < Z < oo. Then if 
|n„| (resp., |n„j|) is the number of blocks (resp., number of blocks of size 
j) of n restricted to {1, . . . , n}, we have 

(13) |n„|~5„?i2— 

almost surely for a random variable Sa determined explicitly from Z. More- 
over, 

IJ^ (2-a)r(fc + a-2) 
|n„| ^ r(a-l)A:! 

In fact, it follows from an unpublished work of Hansen and Pitman [26] that 
(13) implies (12), which, in turn, using Lemma 3.11 of [46], implies (14). Note 
that the distribution on the right-hand side of (14) previously appeared in 
the context of urn schemes in the work of Karlin [30] . See also [48] , and see 
[43, 44], where this distribution occurs in the context of Brownian motion 
and related processes. 

To connect these results to Theorem 9, let 11 be the allelic partition ob- 
tained by superimposing mutation marks at rate on the tree associated 
with a Beta-coalescent, started at time with infinitely many individuals. 
Then 11 is an exchangeable partition and the restriction n„ of 11 to {1, . . . , n} 
has the same distribution as the partition described in Section 3.3. From this 
and (9), one can show (see, e.g.. Lemma 34) that n'^ ^|IIj^| — >p Sq, where Sa 
is the constant on the right-hand side of (9). If one could show that this con- 
vergence holds almost surely, then this would supply an alternative proof of 
Theorem 9. Also, this would presumably work for coalescent processes sat- 
isfying the condition of Theorem 1.9 in [8]. However, we note that proving 
almost sure convergence is difficult due to the randomness of the asymptotic 
frequencies Pi. 
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4. The lookdown process in a continuous random tree. 

4.1. Branching processes obtained from superprocesses. The lookdown 
process is a powerful tool introduced (and subsequently modified) by Don- 
nelly and Kurtz [16] to encode the genealogy of a superprocess by a countable 
system of particles. We will describe it in a more general context than the 
one strictly needed for the applications we have in mind in this paper be- 
cause we believe that this construction is of independent interest. However, 
the lookdown process can be defined even more generally than how we will 
do here (e.g., we will not treat the case where the particles are allowed to 
have some spatial motion and interact). The setting for this part is the fol- 
lowing. We let "0 be a branching mechanism with no Brownian component 
and no drift term, that is, there exists a € R and a Levy measure ly such 
that 

(15) HQ) = aq + J^ (e""^ - 1 + gxl|^<i})i/((i2;). 

Rather than associating with a CSBP with this branching mechanism, 
we first construct a superprocess Mf taking its values in the space of finite 
measures on (0, 1), which is defined through its generator L: for a function 
F acting on measures on (0, 1), 

(16) ^° , 

PL POO 

+ i^idx) u{dh){F{ii + h5^)-F{^i)-l{h<i}hF'{^i,x)). 

The notation -F'(/x, x) stands for linie^o e~^(-^(/^ + e'^x) — Pif^)) a-nd accounts 
for an infinitesimal modification of F in the direction 5x- If V' ha-d a quadratic 
term, then there would be an extra term in the generator; see equation (1.15) 
in [14] . Note that for every < r < 1 , 

Zt = Mt{[0,r]) 

defines a ^/^-CSBP started at Mo([0,r]). Indeed, applying the generator to a 
function F{fj,) = ip{z), where z = fj,{[0,r]), directly yields that the generator 
Li of the process Zt is 

Li(p(z) = a / ii{dx)ip'{x) 





pr poo 

+ ^J-idx) u{dh){ip{z + h) - ip{z) - hls^f,.^iyip'{z)) 
= zL2ip{z) 

since the second integral does not depend on x and is equal to L2'^{z), 
where L2 is the generator of a Levy process with Levy exponent ^((?). By 
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Lamperti's result relating a CSBP to a time-change of a Levy process [35], 
we conclude that Zt is a i/'-CSBP. The interpretation of Mt is as follows. 
If we imagine the population represented by Zt as a continuous population 
where each individual is endowed with an originally distinct label between 
and 1 (and where individuals and their descendants have the same label), 
then Mt([0,a]) is the total number of individuals at time t descending from 
some individual with a label between and a. Another process of interest in 
this setting is the so-called ratio process Rt = Mt/Zt, where Zt = Mt([0, 1]). 
Thus, for every t, Rt is a probability distribution on (0, 1) which describes 
the composition of the population at a given time: the typical state at time 
t > for Rt (at least in the subcritical case, see below) is a linear combination 
of Dirac masses J^iPi^x^, subject to J2iPi — 1) where each atom corresponds 
to groups of individuals in the population at time t descending from the 
same individual at time (whose label was Xj) in proportion pi. 

4.2. The lookdown process associated with a CSBP. The purpose of the 
Donnelly-Kurtz construction is to give a representation of the ratio pro- 
cess Rt as the limit of empirical distributions associated with a countable 
system of particles. A major consequence of this construction is a transpar- 
ent notion of genealogy for Zt, which is otherwise difficult to grasp in the 
context of a continuous population. What follows is largely inspired by [14] 
and [22], Chapter 5. To define the (modified) lookdown process, we have a 
countable number of individuals who will be identified with their type. Ini- 
tially, individual i has type ^i(O). The types Ci(0) for i = 1, 2, . . . are given by 
uniform i.i.d. random variables on (0,1). At any given time t, ^i{t) will be 
the type of the individual occupying level i. The variables (,i{t) may change 
due to events called birth events. Suppose we have a countable configuration 
of space-time points, 

i 

where > and < yj < 1, and assume that J^tiKtVi < ^ for all t>0. 
[Later, we will specify a point configuration {ti,yi) associated with a CSBP.] 
Each atom {ti,yi) corresponds to a birth event. At such a time, a proportion 
yi of levels is said to participate in the birth event: each level flips a coin 
with probability of heads yi. Those which come up heads participate in 
the birth event. We describe the modification in the levels on the first n 
levels. Suppose the levels participating are 1 < ii < 12 < ■ - ■ < ik 1^ n. Then 
at time t = ti, their type is modified by the following rule: for all 1 <j <k, 
(,i^{t) =(,iiit~). In other words, participating levels take the type of the 
smallest level participating. We do not destroy the individuals previously 
occupying levels 12,..., ik, but, instead, we move £,i2{t~) to the first level 
not participating in a birth event and keep shifting individuals upward, with 
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Fig. 4. Representation of the lookdown process. Levels 2, 4 and 5 participate in a birth 
event. Other types get shifted upward. The numbers on the left and on the right indicate 
the types before and after the birth event, respectively. 



each individual taking the first available spot. This is illustrated in Figure 
4. 

One way to make this construction rigorous is to observe that due to our 
assumption J2ti<tyi < only finitely many birth events affect the first n 
levels in any compact time-interval. The processes defined by this procedure 
are consistent by restriction as n increases, so there is a well-defined process 
{S,i{t),t > 0, ^ = 1, 2, . . .) by Kolmogorov's extension theorem. 

Having described the construction for a general configuration of space- 
time points {ti,yi), we now restrict to the case where (ti,yi) is given by 
the following construction. Let Zt{r) be a V-CSBP, where ip has the form 
(15) and where we have written the starting point r > as an argument 
of Zf. Let T be the extinction time (which may not be finite a.s., but will 
be in the subcritical case in which we are interested). We only define the 
lookdown process until time r~. With each time ti such that AZt- > 0, 
associate i/i = AZt./Zt- (observe that < < 1). It is then standard to 
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check that if t < r, then 

ti<t 

Indeed, one can bound Zt- from below by It = info<s<( > so that this 
sum is smaher than 

(/i)-2^(AZ,j2<oo 

U<t 

because Z is obtained as a time-change of a Levy process whose jumps are 
square-summable due to the fact that /q°°(1 Ax^)i/((ix) < oo and when t <t, 
the jumps of Z are the jumps of the Levy process in some random, but finite, 
time-interval. 

Thus, there is a well-defined lookdown process {(,i{t),t > 0, i = 1, 2, . . .) 
associated with this sequence {ti,yi). Observe that for all i > 0, {£,i(t),i = 
1,2, . . .) is an exchangeable sequence so that the limit 

^ oo 
i=l 

is well defined by De Finetti's theorem. Then {pt,t > 0) has the same dis- 
tribution as the process {Rt,t > 0) obtained in the previous section from a 
superprocess Mt started from Mq = rl|o<2:<i} (see, e.g., the argument 
starting from (2.15) in [14]). To understand heuristically why this is true, 
note that when there is a jump in the CSBP, so AZj = a; > 0, some individ- 
ual in the population has a large number of offspring, causing the proportion 
of individuals with the same type as this individual to have a jump of size 
x/{Zt- +x) = AZf/Zf. This is precisely what happens in the lookdown pro- 
cess. 

We now specialize to the subcritical case. That is, we assume that ip is a 
branching mechanism as in (15) and that 

dq 

By a well-known criterion of Grey [27], this ensures that r < oo a.s., that 
is, the population becomes extinct in finite time. Observe that one of the 
nontrivial features of the lookdown process is that since Zt becomes extinct 
in finite time, almost surely only finitely many individuals have descendants 
alive at time t > 0, which means that the composition of the population is 
made of finitely many different types of individuals and that, ultimately, 
only one type remains in the population. Note that this can happen in the 
lookdown process even though we never kill labels because some labels get 
pushed off to infinity due to the successive birth events, thus disappearing 
from the visible population. This feature will become apparent from our 
construction of the lookdown process in terms of the continuous random 
tree. 
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4.3. Constructing the lookdown process from a continuous tree. In this 
section, we will provide a construction of the lookdown process from a contin- 
uous random tree. Once again, we emphasize that the branching mechanism 
need not be stable. However, we will always assume subcriticality, that is, 

dq 

so that there is a continuous version of the height process and its local times 
are well defined (see [17]). 

Before we start, we need to recall some facts about the height process. 
Associated with the process H is an infinite measure which plays a role 
analogous to Ito's excursion measure for Brownian motion (see [47]). The 
excursion property for {Ht,t < Tr) will be used on several occasions. It can 
be phrased as follows. Let {gi,di),i €l he the excursion intervals of H above 
zero, so 

\J{gi,di)={s>0:Hs>0}. 

For each i G X, define the function by ej(s) = Hg.+g ior < s < di — gi and 
ei{s) = otherwise. Let C+([0, cxo)) be the set of nonnegative real-valued 
functions defined on [0, cxo). Recall that {Lg,s > 0,a > 0) is the local time 
process for H. Then the random measure 

(17) EH.-) 

is a Poisson point process on [0,oo) x C+([0,(X))) with intensity measure 
dl X N{du)), where dl denotes Lebesgue measure and N{dLo) is the excur- 
sion measure, which is a cr-finite measure on C+([0, oo)). More generally, H 
(although not a Markov process in general) enjoys a similar excursion prop- 
erty above any given level a > 0. For each a > 0, let {gf,d^),i G T", be the 
connected components of the open set {s:Hs> a}. For each i gT°, define 

the excursion e^""^ by el^\s) = Hg^^s — a for < s < — and e\"'\s) = 
otherwise. For each s > 0, define 

r,'' = inf|t:^ 1{H.<^| > s|, T^ = inf|t:^ l{Hr>a} dr > 

Define the processes {Hg,s > 0) and {H^,s > 0) such that = H~a and 
H'^ = H^a — a. By Proposition 3.1 of [18], the random measure 

is a Poisson point process on [0, cxd) x C+([0,oo)) with intensity measure 
dl X N{doj) and is independent of {H^, s > 0). Since H'^ can be recovered from 
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the random measure (18), a consequence of this result is that {H'^,s > 0) 
has the same law as {Hs, s>0) and is independent of {Hg,s > 0). 

Having recalled this property, we now describe our construction of the 
lookdown process in a continuous random tree. Let {Zt,t > 0) be a V'-CSBP 
started from Zq = r > is assumed to be subcritical) and assume that 
Zt is obtained as the local times of the height process {Ht,t < Tr), as in 
(6). Let ^ := t > 0,j = 1,2,...) be a lookdown process obtained 

from {Zt,t > 0), as in the previous section. That is, it is obtained from the 
configuration of space-time points {ti,AZt./Zt-). The process ^ will serve 
as a reference lookdown process to which we will compare the one we will 
construct below. 

We will now construct a version ^ of the process ^ that will be entirely 
defined in terms of the height process H. We start by introducing some 
notation. Consider the height process {Ht,t < Tr). The key point of this 
construction is that we choose a specific labeling for the excursions; namely, 
we rank the excursions according to their supremum. We denote by e^*^ the 
jth highest excursion above the level t (when t = 0, we sometimes simply 

write Cj instead of Sj^^). We draw a sequence of i.i.d. random variables 
(C/j)jgN with the uniform distribution on (0, 1). They will serve as the initial 
types in the lookdown construction, so that at any time, Cj(t) is equal to 
one of the C/j's. Thus, let Cj(0) = Uj for all j > 1. Then for each t > 0, 

for each j >1, we let k{j,t) be the unique integer such that e^*\ the jth 

highest excursion above t, is part of the excursion e^^-^^, the k{j,t)th highest 
excursion above 0, and we let 

We say that the excursion ej*"* has type U^i^j ty 

Theorem 14. The processes ^ and ^ have the same distribution. That 
((Cj(^))i ^ ^ 0) J = 1) 2, . . .) has the distribution of the modified lookdown 
construction associated with the CSBP {Zt,t > 0). 

Before we start proving this result, here is a description of the dynamics of 
the process (^j(t), t > 0). As t increases, the relative ranking of the excursions 
above t evolves. If AZt > 0, then this means that with probability one, 
H has (infinitely many) local minima at t, resulting in (infinitely many) 
additional excursions above t. Indeed, note that by Theorem 4.7 in [18], this 
corresponds to a unique excursion above t~ splitting into infinitely many 
excursions. Moreover, all local minima of {Ht,t > 0) are in fact associated 
with jumps of Zf (this would not be true if ijj had a quadratic term; see 
Theorem 4.7 of [18]). We then say that some birth event happens. We rerank 
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all excursions according to their new order (again, given by the rank of their 
supremum). Old excursions keep their old type (but might change their level) 
and the newly added excursions take the type from their father. If excursion 
e^*^ splits, then this means that many levels k with k> j take the type Cji't)- 
Those who do not take this type get shifted upward accordingly. To use the 
Donnelly-Kurtz terminology, we say that the levels k> j adopting the type 
^j(i) take part in the birth event. 

Let T = {Ta,a > 0) be the filtration such that Ta = a{H^,b < a). The 
key observation for the proof of Theorem 14 is summarized by the following 
lemma. 

Lemma 15. Let a > be a stopping time of the filtration T such that 
AZa > a.s. Define a sequence (ej)jgN by £i = \ if the level i takes part in 
the birth event at time a for the process ^ (i.e., the ith highest excursion 
above a is a newly created excursion) and otherwise. Then the distribution 
of the sequence (ei)igM is that of a sequence of i.i.d. Bernoulli variables with 
parameter AZa/Za. 

Proof. We know (see Theorem 4.7 in [18]) that if AZ^ > 0, then a 
is necessarily a level where exactly one excursion is splitting into infinitely 
many smaller ones (i.e., a is a level where H reaches a multiple infimum and 
for b < a, all those infima are reached within the same excursion above b). 
In other words, if a is a jump time of Z, there is a unique interval (s, t) such 
that = and Lf - ^ = AZa. Let us denote x = and y = L^. 

For i> 1, define h^'^^ := maxe-"^ to be the height of the ith highest ex- 
cursion above level a and let t •'^^ denote the local time accumulated at level 
a when the excursion e^"'' starts. By applying the strong Markov property 
which will be proved at the very end of this section, in Lemma 17 we see 
that conditionally on Za, the process has the same distribution as H 
run until Tz^ . Hence, the atoms {t\°'\h\"'^) form a Poisson point process on 
[0, Za] X R"*" with intensity measure dt x n{dh), where n is absolutely contin- 
uous with respect to the Lebesgue measure, n(0, oo) = oo and n{h, oo) < oo 
for h> 0. The measure n is the "law" of the heights of excursions under the 
measure A^. 

Observe that the levels that take part in the birth event are exactly the 
levels k which correspond to the rank of a newly created excursion e^^\ 

that is, the excursion such that t^"^ € {x,y), where {x,y) is the new interval 
of local time. The statement then amounts to the well-known fact about 
Poisson point processes that the tj (observe that tj is the time of the jth 
record of the Poisson point process) are i.i.d. uniformly distributed random 



BETA-COALESCENTS AND CONTINUOUS STABLE RANDOM TREES 25 



variables over (0, Zq) and are independent of the sequence of records hj . 
As the events {tj £ (x,y)} and {ej = 1} coincide, the conclusion follows. □ 

Now, fix e > 0. Let ai be the first time t such that AZt/Zt > e. Observe 
that almost surely ai > and that ai is a stopping time for We may 
thus define, inductively, ai < 02 < . . . , the set of stopping times such that 
AZt/Zt > £ and for each i > 1, a, is a stopping time of For i > 1, a multiple 
infimum is reached at level Oj, which corresponds to a single excursion that 
splits into an infinite number of descendants at this precise level. Define a 
process {Cj'\t),t> 0,j = 1,2, . . .) as follows: 

• if t is not a jump time for Z, then nothing happens for that is, we 
have ^(^){t-)=^^'\t); 

• if i is a jump time for Z, but AZt/Zt < e, we use an independent coin 
flipping with probability of heads y = AZt/Zt, and the standard Donnelly- 
Kurtz procedure, to obtain (,^^\t) from £,^^\t—); 

• if t is a jump time for Z and AZt/Zt > e (i.e., t = ai for some i), we say 
that the levels which take part in the birth event are exactly the relative 
ranks of the newly created excursions at level t. 

Lemma 16. For each fixed e > 0, the processes ^'^'^^ and ^ have the same 
distribution. 

Proof. We only need to show that our new rule for the times Oj does 
not differ from the usual construction. As the a^'s are a sequence of stopping 
times, we can apply Lemma 15 to see that we are again deciding who takes 
part in the birth event according to a sequence of i.i.d. Bernoulli variables 
with parameters AZajZa^. The strong Markov property also implies that 
the sequences used at the successive times are independent. Hence, 
has the same distribution as ^. □ 

Proof of Theorem 14. Let bi,...,bm be the times at which there 
is a change in the first n levels for the process ^ (the number m of such 
times is necessarily at most n — 1 since at each of the bi, the diversity of 
types among the first n levels must be reduced at least by 1). Let F be 
a bounded functional on the Skorokhod space D(R!^,M) endowed with the 
product topology inherited from B(M+,M) and assume that F only depends 
on the first n coordinates (levels) for some arbitrarily fixed number n>l. 
Then 

(19) \EiFiO) - EiFiC^'^))\ < ||F|UP({6i, ■■■M't {^i, 02, . . .}) 

because when {61, . . . , bm} C {ai, 02, • • •}, the first n coordinates of ^^^^^ and 
^ coincide exactly. Since S,^'^^ and ^ have the same distribution, by Lemma 
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16, we deduce that 

(20) \E{F{0) - E{F{0)\ < ||i^||ooP({6i, ...,bm}<t {«i,«2, • • •})• 
Note that 

limP{{bi,...,bm} t {ai,a2,---}) = 0. 

e— >0 

Indeed, there are only finitely many jumps affecting the first n levels, so 

T] := inf — — > a.s. 

te{bi,...,6,„} Zit 

Since . . . , 6m} <^ {ai, 02, . . .} is equivalent to r/ < e, we see that 
i^({6i, . . . , 6„} 5^ {ai, a2, • • •}) = < ^ 

as e — > because ry > a.s. It follows by letting e — > in (20) that the 
restrictions of ^ and ^ to the first n coordinates are identical in distribution. 
By the uniqueness in Kolmogorov's extension theorem, the processes ^ and 
are thus identical in distribution. □ 

It now remains to establish the strong Markov property, which we used 
on several occasions. Note that this lemma holds even at stopping times T 
such that AZt^ > 0. 

Lemma 17. Let T be a stopping time of J-. Conditionally on Zt = z, 
the processes and HJ are independent. Moreover, is distributed as 
{Ht,t<T,). 

Proof. When T = s is a deterministic stopping time, then this is the 
content of Corollary 3.2 in [18]. Suppose we now try to verify the claim when 
T is a stopping time of Z which can only take a countable number of values 
{tk}, say. Let F, G be two nonnegative functions defined on C([0,oo]) and 
assume that they are continuous for the topology of uniform convergence on 
compact sets. Since {T = tk\ is JF^^ -measurable we then have 

E[F{Hj,t> 0)G(Hj,t> 0)\Zt = z] 

= J2 E[F{Hl\t> Q)G{H\\t> Q)l{T=t,}\Zt, = z\ 

k>0 

= J2 E[G{Ht^T. , t > 0)]E[F{H^, t > 0)l{T=t,}\ZT = z] 

k>0 



= E[G{Htf,T. , t > 0)]E[F{Hf, t>0)\ZT = z]. 
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To extend this to stopping times taking a continuous set of values, we use 
standard approximations of the stopping time T by 

T - V^+^ 

^n- 2^ -^-'-{fc/2"<r<{fc+l)/2"}- 
A:>0 

Note that approaches T from above within 2~^. To begin observe that 

^{T<Hu<Tr,}du = Zada, 

which, by (right) continuity of Z at T, is smaller than C2~"' for n suffi- 

J' y 

ciently large a.s. To see that Hg" approaches uniformly Hg , we think of the 

J' 

following picture. There are two sources of difference between H^" and Hg . 

One is a shift downward for the excursions above because the parts of an 

excursion between T and T„ are erased in H^" . This shift is at most 2~". 

The other source is that there may be some excursions above T that are not 

counted as excursions above T„, or an excursion above T could be split into 

two or more excursions above Tn because of a local minimum between T and 

T„. This results in a horizontal shift. The total duration of this horizontal 

shift may never exceed the total time spent by H in the strip [T, r„] , which 

is not more than C2~", by the above remark. Hence, by uniform continuity 

J' 'jp 

of H, Hg" approaches uniformly Hg . A moment's thought shows that the 

same reasoning applies to HJ^" (and this does not require left continuity of 
Z at T). 

Therefore, if F, G are, as above, two bounded, nonnegative and continuous 
functions on C([0,oo]) and if (p is also a bounded, continuous, nonnegative 
function on M, since r„ is a stopping time that takes only countably many 
values, we have 

E[F{Hj-,t> 0)G(Hj\t> OMZtJ] 

= / P{ZT„edzMz)E[G{HtAT.,t>Q)]E[F{Hj\t>Q)\ZT^=z\. 
Jo 

If H' is another height process, independent of everything else, and if L„ = 
inf{t > 0,Lf(H') > Zt„}, this can be rewritten as 

E[F{H^" , t > 0)G(hJ" , t > OMZt„ )] 

= E[F{Hj-,t> 0)G{Hi^L„,t> OMZtJ]. 

Note that if L = inf{t > 0, L?(i?') > ^t}, then iH[^L„ ,t>0)^ {H'tr^L^t > 0) 
uniformly almost surely. Indeed, because Zt is independent of H' , it suffices 
to show, by Fubini's theorem, that for a given z, {H'^^j,, , t > 0) converges 

uniformly almost surely as e ^ to {H'^^j,, ^>q), where T.' is the inverse 
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local time at of H' . To see this, dropping the prime from the notation, 
first note that T. is continuous at z almost surely because it is a subordi- 
nator and, as such, does not have fixed discontinuities. Moreover, note that 
^^VT^<s<T^+e ^^y' supremum of the heights of the excursions be- 

tween Tz and Tz+e- By the excursion theory for this can be written as 
Se = sup(^<£ h{ei), where {U, h{ei)) is the Poisson point process of the heights 
of the excursions on an interval of duration e. For any 5 > 0, excursions of 
height greater than 5 have finite measure under N and therefore <S for 
sufficiently small e. It follows that — > as e — > almost surely or, in other 
words, 

\\HtAT^+e - HtAT^ lloo ^ 

almost surely. Therefore, {Ht/\T-,^, ,t>0) converges uniformly to {Ht/\T^ , t > 
0) a.s. 

Since, on the other hand, hJ^" converges uniformly to a.s., and since, 

J' J' 

similarly, H " converges a.s. uniformly to H in the left-hand side, we con- 
clude, by Lebesgue's dominated convergence theorem, that 

E[F{Hf,t> 0)G(Hj,t> OMZt)] 

= E[F{Hj,t > 0)G(if;^^,^ , t > 0)v9(Zt)]. 

Prom this, we immediately deduce, by conditioning on = z, the desired 
identity, 

E[F{Hf, t > 0)G(hJ, t>0)\ZT = z] 

= E[G{HtAT. , t > Q)]E[F{Hf,t >Q)\Zt = z]. □ 

4.4. Proofs of Theorem 1 and Corollary 2. 

Proof of Theorem 1. By Theorem 2.1 in [14], the time-changed ge- 
nealogy of , as defined from the lookdown process, is a Beta(2 — a, a)- 
coalescent. It then suffices to show that the notion of genealogy as we have 
defined it from the height process coincides with the notion of genealogy for 
the lookdown process constructed on the CRT. 

There is a natural notion of genealogy associated with the lookdown con- 
struction. Namely, for any pair z, j > 1 and any times < t < T, we can 
decide if the levels i and j at time T descend from the same level at time 
t (more precisely, we can track their labels by going backward from time T 
to time t to see if they come from the same label). 

When the lookdown construction is obtained (as explained above) from 
the process H, this means that levels i and j at time T have the same 
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ancestor at time t if and only if the ith and jth highest excursions above T 
are descendants of the same excursion above t. 

Recall that (V^, i = 1, 2, . . .) is a sequence of variables in [0, T^] where each 
Vi is the left endpoint of the ith highest excursion above R~^{t). It is clear 

that if two excursions ^^^^ and e^-^ above R~^{t) descend from 
the same excursion above s, then Vi and Vj are straddled by this excursion 
above s or, in other words, that minj.g(y. H{r) > s. Hence, we see that the 
partition- valued process (n(s),0 < s <t) such that i and j are in the same 
block of n(s) if and only if min^g^^i y.) i?(r) > R~^{t — s), is exactly the 
process of the ancestral partition of the lookdown process ^ between times 
R~^{t) and R~^(t — s). By applying Theorem 2.1 in [14], this entails that 
when H is the height process associated with the a-stable branching mech- 
anism, n is a Beta(2 — a, a)-coalescent — this was the content of Theorem 1. 
□ 

Proof of Corollary 2. Again, observe that the genealogy as defined 
from the lookdown process coincides with the following definition: i and 
j are in the same block of lis if the ith and the jth highest excursions 
above level R~^{t) are subexcursions of a single excursion above R~^{t — 
s). Let Ng be the number of excursions between R-^{t - s) and R~^{t) 
and, conditionally on Ng = k, number these excursions in random order 
ei, . . . , e/c, and let £±,£2, ■ ■ ■ be their respective local times at R~^{t). We 
want to show that the asymptotic frequency of the block corresponding to 
an excursion is proportional to £. However, reasoning as in Lemma 15, we 
see that, conditionally on Ng = k and conditionally on £i,£2, ■ ■ ■ ,£k, each 
level i in the lookdown process at time R~^{t) falls in excursion i with a 
probability that is equal to £i/Zji-i(^^y It follows immediately from the law 
of large numbers that the asymptotic frequency of the block associated with 
ei is In other words, the sequence of ranked frequencies of the 

ancestral partition defined by the lookdown process is almost surely equal 
to the process {X{s),0 < s < t). Corollary 2 immediately follows. □ 

5. Small-time behavior and multifractal spectrum. In this section, we 
use Theorem 1 to prove Theorems 4 and 5. We start by introducing our 
main tool, reduced trees. 

5.1. Reduced trees as Galton-Watson processes. The key ingredient for 
the theorems in this section is the reduced tree associated with a height 
process H. For a fixed level a, the reduced tree at level a is a tree such that the 
number of branches of the tree at height < t < 1 is the number of excursions 
of H above level at that reach level a, with the natural genealogical structure 
defined by saying that v is an ancestor of w if the excursion associated with 
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w is contained in v. We will deduce from results of Duquesne and Le Gall [17] 
that when H is the height process associated with the a-stable branching 
mechanism, this tree is a Galton-Watson tree whose reproduction law can 
be described explicitly. 

When the Beta-coalescent is constructed from the continuous random 
tree, the number of blocks N(s) at time s corresponds to the number of 
excursions above level s' that reach level R~^{t), for some s' and t. We can 
deduce the limiting behavior of N{s) when s — > from the limiting behavior 
of the reduced tree as s' —^ R~^{t). However, because the reduced tree is a 
Galton-Watson tree, its limiting behavior is described by the Kesten-Stigum 
theorem, as stated in (22) below, and this leads to a proof of Theorem 4. 
Likewise, Theorem 5 is established by relating the multifractal spectrum 
of Beta-coalescents to the multifractal spectrum for Galton-Watson trees 
and then applying recent results of Morters and Shieh [41] on the branching 
measure of Galton-Watson trees. An important step in the proof of these 
theorems is showing that events concerning the reduced tree at a fixed level 
can be carried over to the reduced tree at the random level R~^{t). 

We now introduce more carefully the concept of reduced trees. We start 
with some notation. If u > 0, let A''(„) denote the excursion measure of the 
height process, conditioned to hit level u, 



which is well defined since A^(sups>o Hg > u) < oo for all n > 0. Let {Hs,s < 
C) be a realization of A'"(„) and consider the process (6'"(i), <t<u) defined 
by r (t) = # 6xci ^j, the number of excursions above level t reaching u of H . 
Simple arguments show that almost surely for all t <u, we have 6^{t) < oo. 

Definition 18. The reduced tree T'' at level u associated with {Hs, s < 
C) is the tree encoded by the process {9'^{tu),0 <t<l).ln other words, each 
branch at level < t < 1 is associated with a unique excursion above level 
tu reaching u. 

In the context of quadratic branching where the height process is reflecting 
Brownian motion, this is a variant of a process already considered by Neveu 
and Pitman [42]. We should emphasize that, by a slight abuse of notation 
we will sometimes use the notation even when the underlying process 
{Hs,s < C) is not a realization of -/V(u), but, rather the height process con- 
sidered until time T^, where it has accumulated local time r at zero. In this 
case, T" is, in fact, a forest consisting of a Poisson number of independent 
realizations of the tree of Definition 18. The following fact will be a crucial 
tool for much of our analysis. It states that up to a deterministic exponential 
time-change, the tree T" is a continuous-time supercritical Galton-Watson 
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(discrete) tree. We recall that here the branching mechanism is assumed to 
be stable. 

Proposition 19. For fixed n > 0, the process (6'"(m(1 — e~*)),0 <t < 
oo) is a continuous-time Galton-Watson process where individuals reproduce 
at rate 1 with a number of offspring x satisfying 

(21) E{r^) = ^^——^ 

a — 1 

More explicitly, 

P(x = fc) = "^'""^^'""^V--^^"'""\ k>2, 
and P{x = k) = Oforke {0, 1}. 

Proof. We show how this result follows from a result in Duquesne and 
Le Gall [17]. To simplify, we will assume that u= 1. By the remark following 
Theorem 2.7.1 of [17], the time of the first split 7 in 6^{t) is a uniform random 
variable on (0, 1). Then, conditionally on 7 = t and ^^(7) = k, the process 
Zy+s is distributed as the sum of k independent copies of {9^~^{s),0 <s< 
1 — t). In particular, if we follow a branch in the tree from level to level 
1, we see that the times at which the corresponding individual reproduces 
are distributed according to the standard "stick-breaking" construction of 
a Poisson-Dirichlet random variable, described as follows. A first cut point 
is selected uniformly at random in (0, 1) and the left piece is discarded. 
Another point is selected uniformly in the right piece. Discarding the left 
piece, we proceed further by selecting a point uniformly in the piece left 
after the second cut, and so on. It is well known and easy to see that the 
image of these points by the map t 1— > — ln(l — t) is a standard Poisson 
process. The distribution of the number of offspring at each branch point is 
naturally given by the law of the random variable ^^(7), whose distribution 
is identified in the remark following Theorem 2.7.1 of [17]. This implies the 
proposition. □ 

Remark 20. We also present an intuitive, but less precise, argument for 
why {0^{l — e~*);i > 0) is a Galton-Watson process (in the case of a stable 
branching mechanism). We recall that the process H^ is independent of the 
process conditionally given the local time at level a = 1 — e~*, and the 
excursions are given by the points of a Poisson point process with intensity 
dl X N{de); see (18) for a precise formulation. In particular, given that k of 
them reach level 1, they are k independent realizations of N^^-ty This proves 

the independence of 6^{1 — e"^*"*"*^) with respect to its past, conditionally 
given 6^(1 — e"*). Moreover, the law of each of these k subtrees is identical 



32 J. BERESTYCKI, N. BERESTYCKI AND J. SCHWEINSBERG 



to that of the whole tree. Indeed, the descendants at level 1 — e"*-*'^'*^ of 
some excursion above 1 — e~* reaching 1 is identical in law, after scaling 
the vertical axis by e~*, to the descendants at level 1 — of an excursion 
above level reaching level 1. [Recall that because the branching mechanism 
is stable, the height process has the following scaling property: if {Hg, s > 0) 
is the height process under the measure A^j-^) , then H^'^^ = {uHg^~a , s > 0) 
is a realization of A''(„) .] This proves that |T^(l-e-*)| is a Galton- Watson 
process. Observe, however, that this scaling argument does not give the 
reproduction rate of individuals, nor the exact offspring distribution. 

We conclude this section by observing that the Galton-Watson process 
— e~*)), t > 0] satisfies the conditions needed to apply the celebrated 
Kesten-Stigum theorem. More precisely, we have the following lemma. 

Lemma 21. There exists a random variable W with > almost surely 
such that 

(22) e-*/("~i)r(u(l-e-*))^VF a.s. when t ^ oo. 

Proof. It can be checked that the reproduction law x li^'S mean m = 
1 + l/(a — 1). The Galton-Watson process is thus supercritical. Moreover, 
P{x ^ ^) decays like and in particular, £'(xlogx) < oo, so we may apply 
to this supercritical Galton-Watson process the Kesten-Stigum theorem (in 
continuous time) [5], Theorem 7.1. □ 

5.2. Proof of Theorem 4 [number of blocks). We will now show that the 
variable W in (22) above is a quantity which can be expressed in terms of 
the local time at level u. We start by focusing on the case u = l and we work 
under the measure Nf^iy 

First, we need a simple continuity lemma for the local time at level 1 
under Nf^iy Let z[^^ denote the total local time of the process H at level t. 

Lemma 22. Under N^iy Z^^^ is continuous att = 1, that is, Z^] = z[^\ 
N(^iya.s. 

Proof. When Zt is the local time at level t of {Hs,s < T^), then it is 
well known that Zt cannot have a discontinuity at level 1 (indeed, Zt is a 
CSBP started at Zq = 1, hence it is a Feller process and so cannot have a 
fixed discontinuity). Conditionally on 

#exco,i = 1, 

the excursion that reaches 1 is a realization of A'"(i) and as 7^exco,i is Pois- 
sonian, this event has strictly positive probability. Hence, the result follows. 
□ 
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We now give the interpretation of W in terms of z[^^ . 

Lemma 23. Let K = {a — and let u> 0. Under N (^^^-^ , we have 
^i/{^'i)e^{u{l - e)) ^ a.s. 

(u) 

as £ — > 0, where Zu denotes the local time of H at level u. 

Remark 24. This result is thus a generahzation of Levy's result for the 
local time of Brownian motion as the limit of the rescaled "downcrossing 
number" (see, e.g., [47]). A similar result on the upcrossing number also 
exists and is, in fact, much simpler than the one we prove here due to the 
existence of an excursion theory above a fixed level. 

Proof. For simplicity, we will prove this result assuming that u = 1, 
but the case of general u follows exactly the same arguments. We thus wish 
to prove that 



(1) 



as e — > 0. We already know, by Lemma 21, that e^/^" ^^^"^(1 — e) converges 
almost surely to W . Hence, it is enough to prove the convergence in proba- 
bility here to obtain that W = Kz[^^ a.s. and to thereby conclude. 

By excursion theory, conditionally on z[^2i; = z^, the number of excursions 
above 1 — e that reach 1 is Poisson distributed with mean 2;eA^(supj,>Q if^ > 
e). Now, recall that by [17], Corollary 1.4.2 applied with ip{u) = u", 

n(supHs >e]=ia- i)-i/(°-i)e-V(«-i) = 

\s>0 / 

[this is why the factor u~^^^°'~^^ appears in the limit when u^l since, in 
this case, we need to compute A^(sup3>o > ue)]. Let 5 > and let us show 
that 

(23) P(ei/("-i)0i(l -e)> Kz[^\l + S))^0 

as e — > 0. To do this, note that this is smaller than 

P(1Z(^^ -Zfll > A'zi^^5/2) 

(24) 

+ P(ei/("-i)ei(l - e) > KZ[^X{1 + 5/2)). 

The first term converges to by continuity of Z at level 1. On the other 
hand, Markov's inequality implies that if X is a Poisson random variable 
with mean m/e, then for every A > 0, 



P{eX > m{l + x)) < exp 



^(-l + e^-A(l + x)) 
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By choosing A > sufficiently close to 0, we can find c > such that 

P{eX > ?n(l + x)) < exp(— cm/e). 
Therefore, the second term in (24) is bounded from above by 
E(exp(-c'Z;L^^e~i/("-i))) ^ 

for some c' > 0, by Lebesgue's dominated convergence theorem, since — > 
z[^\ a.s. This gives the convergence in probability for the lemma. □ 

We note that the case u^l can also be obtained from the case u = 1 by 
using scaling properties of the process H: if {Hs,s > 0) is the height process 
under the measure -/V(i), then H^'^^ = {uHg^-a/(a-i),s > 0) is a realization of 
N(^u) (see, e.g., the remark before Theorem 3.3.3 of [17]). 

Lemma 25. Assume that 9^{t) is obtained for <t < u from the reduced 
tree associated with the process (Ht, < t < T^). Then 

(25) linit^/("-i)r(l-t)^i^u-^/("-^)Z„ a.s. 

Proof. This is a simple extension of Lemma 23. Again, to simplify, 
assume that u = l. There is a slight difference with Lemma 23, because this 
was stated under the measure iV(i), whereas here, is defined from the 
height process {Hs,s < T^) and not a realization of -/V(x). However, this does 
not change the limit result, since the excursions of {Hs,s < T^) reaching 1 are 
independent and distributed with law A'"(i) (note that the result is trivially 
true when no excursion reaches level 1). Therefore, the result remains the 
same. □ 

The point of the next lemma is to show that any almost sure property 
of the tree T" still holds almost surely when the fixed level u is replaced by 
the random level R~^{t) if we choose t outside a deterministic set of Lebesgue 
measure 0. By convention, if T" is empty (i.e., if supo<s<'7;, -ffs < u), we 
declare any property to be true by default. Since we wish to study the 
property Au at level u = R~^{t) for some t and T" is never empty, this will 
never play any role. 

Lemma 26. Let Au be a property of the tree T" such that for every u>^, 
P{Au \ supo<s<Tr. Hs> y) = 1. Then the set of t such that P(j4^-i(j)) < 1 has 
zero Lebesgue measure. 

Proof. Let F be the set of t such that At fails. By Fubini's theorem, 

POO 

E dt = 0. 
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Therefore, Leh{F) = a.s. On the other hand, 1 1— > R{t) is almost surely 
an absolutely continuous function. Indeed, it has a derivative at all points 
where Z is continuous and Z has only countably many discontinuities a.s. 
Therefore, R{F) also has zero Lebesgue measure almost surely. Hence, 



oo 







'^{R-Ht)&F}dt = a.s. 



By taking expectations, we see that 

Jo 

which proves the claim. □ 

The point is that the set F' of t such that A fails at R~^{t) may be chosen 
deterministically. If t ^ F', then, with probability one, holds, even 

though a priori we only knew this property for fixed, deterministic levels. 

As a consequence of Lemma 26, we may choose a deterministic t such 
that the limit theorem for the number of vertices on T" remains true for the 
level u = R~^{t). For simplicity, we will assume that t = 1 is a valid choice, 
and we write To = T'^ '•^^ for the tree which has a set of vertices at level t 
(0 < t < 1) given by the excursions above R~^{l)t that reach level R~^{1). 
Hence, 

(26) liniti/(«-i) |To(l - t)H /i a.s. 

The only thing that remains to be considered is the behavior of t i— > 
— t) when t is small. 

Lemma 27. As t ^ 0, the following asymptotics hold almost surely: 

R-Hl) - R-\i - t) ~ t- L^Z^li 

a{a — Iji [a) ^ ^'-> 

meaning that the ratio of the two sides converges to 1 almost surely. 
Proof. Let 

^^^^ '^=a(a-l)r(a)^^-'w 

The lemma follows simply from the fact that almost surely the function R{t) 
is differentiable at t = since Z is continuous at R^^{1). Its derivative 

is given by 

a{a - l)r(a)Z]jI?(^) = q-' 
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which is nonzero almost surely. Therefore, R ^{t) is also differentiable at 
t = l and its derivative is g. □ 

Proof of Theorem 4. Now, to finish the proof, note that for t < 1, 

N{t) = e^~'^^\R-^{l-t)). 

Since R^^(l — t) = R^^(l) — tq + o{t), by monotonicity of 9^ we see 
that 

N{t)r^e^''^^\R-\l)-tq). 
On the other hand, by (26), we have 

After cancellation, we obtain that almost surely 

as stated in Theorem 4. □ 

5.3. Evans^ metric space and multifractal spectrum. We begin with by 
a description of the basic setup for this section, which is Evans' random 
metric space S. This space was introduced by Evans in [23] in the case 
of Kingman's coalescent, and some properties of S (such as its Hausdorff 
and packing dimensions) were derived in [8] in the case of a Beta(2 — a, a)- 
coalescent and other coalescents behaving similarly (see [8], Theorem 1.7). 
The space S is defined as the completion of N for the distance ds which is 
defined on N by 

ds{i,j) =inf{t:i~n(t) j}, 

that is, ds{i,j) is the collision time of i and j. Observe that ds is, in fact, 
an ultrametric, both on N and 5, that is, 

ds{x, z) < ds{x, y) V ds{y, z) Mx, y,zeS. 

The space {S,ds) is complete by definition and hence it is compact as soon 
as n(i) comes down from infinity. Indeed, for each t > 0, one needs only 
N(t) < oo balls of diameter t to cover it, which implies that S is precompact. 
Together with completeness, this makes the space 5 compact. Given S C 5, 
we write clB or B for its closure (with respect to ds)- Let Ii{t) := min{j € 
Bi{t)} be the least element of Bi{t). Then the set 

U,{t) = c\Bi{t) 

= cl{iGN:j~nw^i(i)} 
= cl{jGN:d(j,Ii(t))<t} 
= {y(^S:d{y,Ii{t))<t} 
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is a closed ball with diameter at most t. The closed balls of S are also the 
open balls of this space and every ball is of the form Ui{t). In particular, it 
is easily seen that the collection of balls is countable. For x (z S and t>0, 
we write Bx{t) for the ball of center x and diameter t [observe that in the 
case X G N, this notation is consistent with the blocks convention for n(t)]. 

It is possible (see [23]) to define almost surely a random measure r]{-) on 
S by requiring that for all i € N and all i > 0, the measure r]{Ui{t)) is the 
frequency of the block of n(t) containing i. We call rj the mass-measure or 
the size-biased picking measure. Recall that for 7 < l/(a — 1), the subset 
5(7) of S is defined as 



Results from [8] suggest that 7 = 1/ (a — 1) is the typical exponent for the 
size of a block as time goes down to 0. Hence, here, we are looking for 
existence of blocks whose size is abnormally large compared to the typical 
size as time goes down to 0. The next result gives the precise value of the 
Hausdorff dimension of this set (with respect to the distance on the space 



The key idea for the proof of Theorem 5 is the observation that the space 
S, equipped with its mass measure 77, can be thought of as the boundary of 
some Galton- Watson tree [more precisely, the reduced tree at level R~^{t)] 
with the associated branching measure. Hence, the multifractal spectrum 
of 7/ in S" is the same as the multifractal spectrum of the branching mea- 
sure in the boundary of a supercritical Galton- Watson tree. The case where 
the offspring distribution is heavy-tailed and has infinite variance has been 
recently studied by Morters and Shieh [41] and we can use their result to 
conclude. For basic properties of the branching measure of a Galton- Watson 
tree, we recommend the references [37, 38, 41]. 

Recall that T" designates the reduced tree at level u, that is, it is the 
tree where, for each level < t < 1, each vertex at level t corresponds to 
one excursion of H above level ut that reaches level u. For our purposes, we 
eventually wish to work under the law of (iJ(s), <s<Tr) (conditionally on 
the event supj,<j'^ Hs> u, otherwise the tree is empty), but it will sometimes 
be more convenient to use Nu{-), the excursion measure conditioned to hit 
level u. The difference is, of course that in the latter case, T" is a tree with 
a single ancestor, while in the former case, T" is actually a collection of a 
Poissonian number of i.i.d. trees joined at the root. These trees have the 
distribution of the reduced tree under A'"(„)(-). We emphasize that for this 
study of the multifractal spectrum, this does not create any real difference. 

By definition, a ray of T" is a path ((^(t),0 < t < 1) such that C(0) is the 
root, for every t, (^(t) is a vertex at level t in T" and for all s <t, ^(s) is an 




S). 
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ancestor of C(i)- Then the boundary of the tree T", denoted (9T", is just the 
set of all rays. The boundary can be equipped with a metric distgx by 
letting distgT(^^) V) = l — t if t is the height at which U and V diverge. Let 
|T"(t)| :=0(")(ut) be the size of generation at level t. By Proposition 19, we 
see that (|T"(1 — e~*)|,t > 0) is a continuous-time Galton-Watson process 
where individuals live for an exponential time with parameter 1 and then 
reproduce with offspring distribution x- Recall from Lemma 21 that there 
is a random variable W > almost surely such that 

W= lim e-*/("-^)|T"(l-e-*)|. 

t— >oo 

Furthermore, for every vertex v € T", we can define T^{v), the subtree 
rooted at v, and W{v), the limit (which exists almost surely) of its associated 
martingale. As there are countably many branching points of T", this allows 
one to build a natural measure n, called the branching measure on (9T", by 
introducing the requirement 

(28) /.({C G aX" : C(l - e-*) = v}) = 

Observe that the set on the left-hand side is a ball of radius e~* centered on 
any ray C such that C(l — e~*) = v. Having defined fi on arbitrary balls of 
the boundary of the tree 5T" , this uniquely extends to a measure /i which is 
defined on arbitrary subsets of (9T" by Caratheodory's Extension theorem 
(see page 438 of [19]). 

When ti > is a fixed deterministic level, is a collection of Galton- 
Watson trees. The definitions introduced above then coincide with the stan- 
dard notions of distance, boundary and branching measure for a collection 
of Galton-Watson trees. The lemma below is essentially a reformulation of 
Theorems 2.1 and 2.2 in [41] within our framework. 



Lemma 28. Conditionally on supQ<j,<'p^ //s > u, the multifractal spec- 
trum of fL is given as follows: for all ^ <'j < 

dhnJv G 9T-liminf i^^M^i^ < 7) = 7« " 1 
1^ r^O logr J 

and the set is empty if j <l/a. If < 7 < -rj^jw, then 



di^nJv G dT : limsup ^"^^^l^^"^' ^^^^ > 7 



I ' r^o logr J 7(a-l)2 



a 



and the set is empty when 7 > a/ {a — 1)^. 
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Proof. First, we remark that it suffices to prove this result under the 
measure A''(„). Moreover, it is elementary to check that 

[ t^oo -t J 

|_ n^oo —77, J 

and that 

I t~*oo —t } 

= (,..3T-:lin..up'°'^<'''^'^-^"'»'=4. 

L n—*oo — TL J 

Sampling at these discrete times gives us a discrete-time Galton-Watson 
process which satisfies the assumptions of Theorems 2.1 and 2.2 of [41]. Its 
offspring variable is given by 

Xdiscrete:=|T"(l-e"l)|. 

Observe that, by construction, P(xdiscrete = 0) = and P(xdiscrete = 1) < 1. 
Furthermore, it is easily seen that -E'(xdiscrete) = e^/^""^). By [5], Corollary 
2, Chapter III. 6, the offspring variable Xdiscrete in discrete time and x satisfy 
the XlogX condition simultaneously, so 

-E'(XdiscretelogXdiscretc) < OO. 

The last step is to check the values of the two constants 

r := - log(P(xdiscrcte = 1))/ log(-E(xdiscrctc)) 

and 

. , -logP(Xdiscretc > x) 

r := hmmf . 

x^oo log X 

Note that 

Xdiscrctc — 1 occurs if the ancestor has not reproduced by time 1. 
Since the time at which she reproduces is, on this timescale, an exponential 
random variable with mean 1, we see that -P(xdiscrete = 1) = so r = 
— {a — l)log(e~^) = a — 1. To compute r requires a few more arguments. 
Now, it is known (see [37], (3.1) and (3.2)) that r is equal to 

sup{a > 0,^(xSiscrctc) < oo}- 

On the other hand, by [5], Corollary 1, Chapter III. 6 for all a > 1, -E'(Xdiscrete) < 
OO if and only if E[x°') < oo. Using (10), we see that x admits moments of 
order up to and excluding a, therefore r = a. Application of Theorems 2.1 
and 2.2 of [41] concludes the proof of the lemma. □ 
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The proof of Theorem 5 is now straightforward. We show that the multi- 
fractal spectrum of 77 in 5 with respect to the metric ds is necessarily the 
same as the multifractal spectrum of /.t in dT with respect to distgx- 

Proof of Theorem 5. Let T be the tree whose vertices at level t 
consist of those excursions above level R~^{t) that reach level R~^{1). As 
above, the boundary dT of the tree T is just the set of all "infinite" paths, 
that is, of paths (<^(t),0 <t<l) such that for every t, is at level t of T. 
We may equip dT with the following metric: the distance between two rays 
C and C is simply 

distar(C, = 1- sup{t < 1 : (it) = C'{t)}. 

There is a one-to-one map $ between S and dT which can be described as 
follows: let ^ £ dT, then for each t € (0, 1), the vertex ("(1 — t) corresponds, 
by definition, to an excursion above R~^{1 — t) that hits level R~^{1) and 
hence to a block -B^(t) of the partition n(t), where 11 is the embedded 
coalescent process. When t < t' , B(^(t) C B(^{t'). Define i{t) := mini3^(t), the 
least element of the block that corresponds to the vertex ^(1 — t). Note that 
the function i{t) satisfies the Cauchy criterion (with respect to the metric 
ds) as i — > 0, by construction. Since 5 is a complete metric space under 
ds, it follows that there is a unique x € S* such that ds{x,i{t)) —>■ when 
t ^ 0. We put $"^(C) = X. In the converse direction, since N is dense in S, 
for any x € 5, we may consider a sequence (i„, n = 1, 2, . . .) in N such that 
ds{in,x) — > when n ^ 00. Without loss of generality, we may assume that 
dsiinix) is monotone decreasing. Then the sequence of blocks B{in,tn) that 
contain i„ at time t„ = ds{in,x) defines a unique ray d^x such that ^^(l — ^n) 
corresponds to -B(i„,t„) for each n. Moreover, (^^ does not depend on the 
particular sequence in converging to x, so we may unambiguously define 
^{x) = Qx- It is easy to see that <I>(<^~^(^)) = C,. For instance, this map $ 
acts on the integers as follows: Mi G N, $(i) is the unique ray {C,{t),t > 0) such 
that for each i > 0, the integer i is in the block of n(t) which corresponds 
to C(t). 

Hence, we may identify S with dT and note that, by construction, dis- 
tances are preserved in this identification 

ds{x,y) = d:isi3r{^{x),^{y)). 

Furthermore, if z is a vertex at level t of T, let £(z) the total local time at 
level R-^{1) of the excursion defining z, divided by Z^-i(i), the total local 
time of the whole process {Hs,s < T^) at level R~^{1). The correspondence 
between local time at level R~^{1) and asymptotic frequencies of the blocks 
of n(i) implies that 



(29) 



n{B{x,t))=l{Cx{t)), 
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where Cxii) is the vertex corresponding to B{x,t), that is, the vertex at level 
t on the ray (x- Hence, as the map $ preserves the distance, it is easy to see 
that diniT-^ 'S'(7) = diniT^ S'^j), where 

(30) S'h) = {ce8T:li„„nfl2S^||^<T} 

because the two sets coincide via the map Thus, we want to prove that 
dim^(S"(7) = (7a — 1). On the other hand, recall that T is just a rescaling 
of To, which is the shorthand notation for the reduced tree at level R~^{1). 
Recall that this tree has a set of vertices at level t (for < t < 1) corre- 
sponding to excursions above level tR~^{l) reaching i?~^(l). Let us first 
treat the case 7 < l/{a — 1) of "thick points." By Lemma 27, we have that 
R~^(l) — i?~^(l — t) tq (where, as before, q denotes the random number 
in Lemma 27), so it is enough to prove that dim7^S'o(7) = (7a — 1), where 

On the other hand, by Lemma 23, for a fixed level u > 0, the limit W of the 
Kesten-Stigum martingale associated with the reduced tree at level n is a 
constant multiple of the local time at u. Let (C(t), < t < 1) be a ray in 
Applying this to the subtree rooted at v = C{t), it follows that the number 
VF(?;) defining the branching measure on dT^ is also a constant multiple 
of the local time i{v) at level u enclosed in the excursion corresponding to 
vertex v, 

W{v) = Jf(txe-*)-i/("-^)£(C(0), 

since the subtree rooted at v has the law of T"*^ *. In other words, dividing 
both sides by e*/'^"~^) and referring to (28), if /i is the branching measure 
on then almost surely, for all i > 0, 

Mi?(C,e"*))=i^£(C(t)), 

that is, the branching measure associated with a vertex C(t) = z £ T" is a 
constant multiple of the local time i{z) enclosed at level u in the excursion 
corresponding to z. Therefore, using Lemma 28, this implies that almost 
surely, conditionally on the event supo<s<j'^ Hs > u, 

dim^ {C G T" : liminf ^"^^^^^''^^ < 7} = 7^ " 1- 

We may therefore apply Lemma 26 to conclude that if t ^ N, where is 
a deterministic set of Lebesgue measure zero, then this property also holds 
for the reduced tree at level R~^{t). There is, of course, no loss of generality 
in assuming that 1 ^ A, so we conclude that 

dim-HS'o(7) =7"- 1, 
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as required. When 7>l/(a — 1), the proof follows the same lines and uses 
the "thin points" part of Lemma 28. This concludes the proof of Theorem 
5. □ 

6. Site and allele frequency spectrum. Our goal in this section is to 
prove Theorem 9. Our proof relies heavily on the connection between Beta- 
coalescents and Galton-Watson processes developed in the previous sec- 
tion. Throughout this section, (^t,i > 0) will denote the continuous-time 
Galton-Watson process where individuals live for an independent exponen- 
tial amount of time and then give birth to a number of offspring distributed 
according to Xi where P{x = 0) = P{x = 1) = and, for k>2, 

a(2 — a)(3 — a) ■ ■ ■ (k — 1 — a) aT(k — a) 
P[X = k) = 



k\ kT{2-a)' 

This offspring distribution is supercritical, with mean m = l + l/(a — 1). 
Also, recall that Mk{n) denotes the number of families of size k in the 
infinite sites model when the sample has n individuals, and that Nk{n) is 
the equivalent quantity in the infinite alleles model. 

6.1. Expected values. Suppose marks occur at times of a constant rate 6 
Poisson point process along the branches of a reduced tree at level 1 under 
the measure iV(i) , so that the reduced tree has a single ancestor. Recall that 
the number of branches of at level 1 — e~* is a Galton-Watson process. 
Hence, after rescaling, this amounts to having mutation marks at intensity 
0e~* per unit length at time s on the Galton-Watson tree that comes from 
the process We will stop the Galton-Watson process at a fixed time t. If 
there is a mutation at time s <t, then we say that it creates a family of size 
k if the individual with the mutation at time s has k descendants alive in 
the population at time t. Let {t) denote the number of families of size 

k at time t. The following result shows that a simple calculation gives the 
asymptotic behavior of E[M^^ {t)]. A sharper argument will be needed to 
establish convergence in probability. 

Proposition 29. Let r he an independent exponential random variable 
with mean 1/c, where 

2-a 



a — 1 

We have 



lim e~^'E{M]f^{t)) = -P{Cr = k). 

t^oo c 
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Proof. By applying the branching property and using the facts that 
E[(^t] = e*^™""^)* and that m — 2 = c for the third equahty, we obtain 

E{Mjf'^{t))= / P(there is a mark in = /c) 

Jo 

= f E{iiBe-')dl P{it-i = k) 
Jo 

It-i = k) dl 

Jo 

f\e-^''P{^u = k)du. 
Jo 



Jo 



c , 

-ct 



Multiplying both sides by e and letting t — > cxd, we get 

f) poo 

lim e-^^E{M^"^{t)) = - / ce-'="P(^„ = A;) du 

t-»oo c Jo 

c 

To make the limiting expression for E[M^^ (t)] more explicit, we now 
calculate P{£,t = k). 

Lemma 30. For all positive integers k, we have 

{2-a)T{k + a-2) 



(31) P{ir = k) 



r(a- 1)A;! 



Proof. We prove the result by induction. Note that = 1 if and only 
if there are no birth events before time r. Because r has an exponential 
distribution with rate parameter c and individuals give birth at rate 1, it 
follows that 

P{ir = l) = -r^=2-a, 
1 + c 

which agrees with the right-hand side of (31) when k = l. 

Now, suppose that k>2 and (31) is valid for j = 1, . . . , A; — 1. Let = 
P{^t = k for some t <t). By conditioning on the number of individuals be- 
fore there were k individuals, we get 

fc-i 

(32) rk = Y.rj--^Pix = k-j + l) 
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because if there are j < k individuals, then the probabihty of having another 
birth before time r is +c) and if this happens, the probabihty that there 
are k individuals after the next birth is P{x = k — j + 1). If = A; for some 
t <T, then we will have = ^ if and only if r occurs before the next birth 
event. When there are k individuals, birth events happen at rate k, so the 
probability that r happens before the next birth is c/{k + c). Therefore, 
-f (^T = k) = crk/{k + c) and so = P{£,t = k){k + c)/c. Substituting this 
into (32), we get 

k-1 

(33) P{ir = k) = -r—J2 ^nir = j)P{x = k-j + l). 

K + C ~ 

Using the induction hypothesis and the fact that P{x = k) = aT(k — a)/ 
(/c!r(2 — a)), we obtain 

_ a(2-a) '^ T{j + a-2)Tik-j + l-a) 

' T{a-m2-a){k + c)fr{ (j - 1)!(A; - j + 1)! 

Using the fact that k + c = {ka — k + 2 — a)/{a — 1) and letting £ = j — 1 in 
the sum, we get 

P{^r = k)- «(«-l)(2-a) 



(ka-k + 2- a)T{a - 1)T(2 - a) 
(34) 

If a,6 G M and n G N, then by starting with the identity (1 — x)~"(l — 
x)~^ = (1 — x)~("+'') and considering the nth order term in the Taylor series 
expansion of both sides, we get (see, e.g., page 70 in [3]) 

'A ia)k{b)n-k ^ {a + b)n 
k\in-k)\ ~ n! ' 

where (a)^ = a(a + 1) • • • (a + fc — 1). Since (a)^ = T{a + k)/T{a), it follows 
that 

^T{a + k)rib + n-k) r{a)T{b)ia + b)n 



k=0 



k\(n — k)\ nl 



When a + 6 = — 1, we have (a + 5)„ = 0. Therefore, (35) with a = a — 1 and 
b = —a implies that the sum on the right-hand side of (34) would be zero if 
it went up to k rather than k — 2. It follows that the sum up to /c — 2 is equal 
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to the negative of the sum of the terms when i = k and i = k — 1, which is 

_r(A; + a-2)r(l -a) T{k - a - l)T{-a) 
{k-l)l kl 

r(2 - a)r{k + a-2){ka-k + 2-a) 



kla{a — 1) 

Combining this result with (34) gives (31). The lemma follows by induction. 
□ 

6.2. A queueing system result. The problem on a Galton-Watson tree 
will essentially reduce to the following lemma. 

Let Qt be the state of a queueing system where customers arrive at rate 
Ae^^ for some constants A and c > 0. We assume that there are infinitely 
many servers and that each customer requires an independent exponential 
rate A amount of time to be served, so when the state of the queue is m, 
the departure rate is Am per unit of time. 



Lemma 31. As t ^ oo, almost surely 



A 



A + c 



Proof. Because all customers depart at rate A, the number of customers 
at time zero does not affect the limiting behavior of the queue as t — > oo. 
Therefore, we may assume that the number of customers at time zero is 
Poisson with mean A/{X + c). The probability that a customer who arrives at 
time s < t is still in the queue at time t is e~'^(*~*) . Therefore, the distribution 
of Qt is Poisson with mean 

Ae-^' /■* . . Ae=* 



+ / Ae''e-^^'-''>ds 
Jo 



X + c Jo A + c 

For all positive integers n, let t„ = (3/c)logn, so E[Qt„] = An^/{X + c). 
Let Bn be the event that (1 - e)An^/{X + c) < Qt^ < (1 + e)An^/{X + c). 
Note that if Z has a Poisson distribution with mean then 

(36) P{\Z-f,\>efi)<^, 

by Chebyshev's inequality. Applying (36) with fi = Av? j (A + c), we get 

Therefore, by the Borel-Cantelli lemma, almost surely occurs for all but 
finitely many n. 
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Between times tn and tn+i, the number of arrivals is Poisson with mean 
at most 



Jt„ c 



Itn c c 

Therefore, the probability that there are more than QA{n + l)^/c arrivals 
between times tn-i and t„ is at most the probability that a Poisson random 
variable with mean 3A(n + l)^/c is greater than 6j4(?i+ 1)^/c, which by (36) 
with e = 1, is at most c/(3A(n + 1)^). The number of departures between 
times tn and tn+i also has a Poisson distribution and since E[Qt\ is an 
increasing function of t, the expected number of departures between times 
tn and tn+i is also bounded by 2>A{n + l)^/c. Therefore, the probability that 
there are more than 6A(n + l)^/c departures between times tn and tn+i is at 
most c/{2>A{n-\- 1)^). Let Dn be the event that between times tn-i and tn, 
there are at most 6j4(n + l)^/c arrivals and at most QAin + l)"^ /c departures. 
By the Borel-Cantelli lemma, almost surely D„ occurs for all but finitely 
many n. 

Suppose that Bn and Dn occur for all n> N. Suppose that tn<t< tn+i- 
If n > iV, then 

(1 - e)Av?' QA{n + 1)^ ^ g < + ^^^"^ + ^^^^ + ^^"^ 
A + c c ~ ~ \ + c c 

Because < e"'^* < l/(n + 1)'^, it follows that 



and 



limsupe Qt<: — r a.s. 

n— >oo A + C 



liminfe-^*Qt >-^V^ a.s. 

n-»oo A + C 



Since e > is arbitrary, the result follows. □ 

Having proven this result, we easily deduce the following one. Suppose 
that {Qt,t > 0) is the length of the queue in a queueing system, where the 
arrival rate is a random process at [i.e., the process of arrivals {Qf ,t > 0) is a 
counting process such that — /q ds is a martingale] and the departure 
rate at time t, which is nonrandom, is A(i) per customer. Then if at and 
X{t) have the correct asymptotics as t ^ oo, the asymptotics of Qt are also 
the same as in the previous case. 

Lemma 32. // at ~ Ae^^ almost surely as t ^ oo and lim^-^oo = A, 
then almost surely 

A 



-ct 
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Proof. Let At = j^asds. Since Qf is a counting process and — At 
is a martingale, there exists a Poisson process Nf such that = . Let 
e > and consider the function 

bt = {A{l + e)e'' -at)^. 

Let be an independent Poisson process. Compare the state of the queue 
{Qtit > 0) with the queue {Qi^t^t ^0), in which customers arrive with the 
jumps of {Nj^^ + N}^^,t > 0), where Bt = J^bgds, and customers get served 
at rate A(t). By properties of Poisson processes, the arrival process of the 
queue Qi is thus itself a Poisson process with rate at + h per unit time. 
Observe that for t sufficiently large, at < ^(1 +e)e^^, so for t sufficiently 
large, bt = A{1 + e)e'^* — at. Thus, for t sufficiently large, the total rate of 
arrivals for the queue Qi is at + bt = A{l + e)e'^^ . Let {Q2,t, t>0)he the queue 
where arrivals are given by N'^^ + iV^^ when at < A{1 + e)e'=* and N^f^^^^^^.t 

otherwise, where {Nf^t > 0) is another independent Poisson process. Again, 
assume that customers depart from the queue at rate X{t). Since customers 
depart from Qi and Q2 at the same rate, the queues can be coupled so that 
they are identical after a certain random time T. Moreover, {Q2,t,t > 0) is a 
queueing system where arrivals occur at rate ^(1 + e)e'^* throughout time. 
Because A(t) A, we have A(i) > A — e for sufficiently large t. Therefore, 
the queue {Q2,t,t > 0) can be coupled with another queue {Q-i,t-,t > 0) with 
arrival rate A{1 + e)e'^* and departure rate A — e such that Q2,t < Qz,t for 
sufficiently large t, because for t sufficiently large, all customers depart Q2 
at least as quickly as they depart Q3. Hence, by Lemma 31, almost surely 

^ A{l + e) 



limsupe Q2^t ^ 



t^oo ' (A-e) + c 

and similarly for Qi, because Qi and Q2 have the same asymptotics. By 
construction, we also have that for all t > 0, Qt < Qi,t, because every cus- 
tomer who arrives in Q also arrives in Qi. By taking e — > 0, this implies 
that 

A 



limsupe '^^Qt < 



X + c' 



Applying similar reasoning, we get liminf^^oo e > {1 — e)A/ {{X + e) + c) 
a.s. for all e > 0, which implies the lemma. □ 



6.3. Almost sure result for a Galton-Watson tree. Recall that we are 
considering the Galton- Watson tree associated with the branching process 
{^t,t > 0), with mutation marks along the branches at rate 6e~'' at time s. 
By Lemma 21, there is a random variable W such that 
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Recall that Mjf"^ (t) denotes the number of marks before time t such that 
the individual who gets the mutation has k descendants at time t. Likewise, 
let Njf^ (t) denote the number of blocks of size k in the allelic partition at 
time t, when we assume two individuals have different alleles if any of their 
ancestors have had a mutation since their most recent common ancestor. 

For the proof, we introduce two other quantities. Let Lfc(^) be the number 
of mutations before time t such that the individual who gets the mutation 
has k descendants alive at time t and none of this individual's descendants 
undergoes another mutation before time t. Let K(t) be the number of mu- 
tations before time t such that some descendant of the individual that un- 
dergoes the mutation also undergoes another mutation before time t. The 
strategy of the proof will be to show that M^"^ (t) and N^"^ (t) both be- 
have asymptotically like Lk{t)^ while K{t) is of lower order. The lemma 
below concerns Lk{t). 

Lemma 33. For all k>\, 

OW 

e-^^Lkit)^ P{ir = k) a.s. 

c 

Proof. Our first step is to prove that this result holds with a limit 
being 9Wak for some deterministic sequence of positive numbers a^. 

We prove this by induction on A; > 1. For /c = 1, observe that, conditionally 
on the process > 0), the process Li[t) can be viewed as a birth-and- 
death chain in which the total birth rate is Oe^^^t and each individual dies at 
rate 1 + 0e~*. Indeed, Li{t) increases by one every time some branch gets hit 
by a mutation. Since marks arrive at rate 9e"* dt at time t on each branch 
of the Galton- Watson tree, this means that, conditional on {(,t,t>0), new 
mutations occur at rate 6e~^^tdt. Also, Li{t) decreases by one each time a 
member of a family of size 1 either reproduces or experiences a mutation, 
which happens at rate 1 + 0e~* for every individual. Because e~^™'~^^*^j — > W 
a.s., we can view Li{t) as a queueing system whose arrival rate is asymptotic 
to OWe'* and whose departure rate converges to 1. Therefore, by condition- 
ing on W and applying Lemma 32, we have 

l + c 

Because 



/■oo 

P{i, = l)= ce-'^P{i^ = l)du 
Jo 

ce-^^e"" du 
c/(c + l). 
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we can take ai = {l/c)P{(,r = 1)) which is deterministic. 

Now suppose that k>2. Note that famihes of size k are obtained when an 
individual in a family of size j with j <k — 1 reproduces and has k — j + 1 
offspring. Therefore, the process {Lk{t),t>0) is a birth-and-death chain 
with arrival rate 

k~l 

(37) Y.^^^(t)P{x = k-j + l)dt. 

i=i 

We emphasize that this does not mean that conditionally on {Lj{t),t > 
0,i = l,...,k — 1), the process is a queueing system with arrival rate 

(37) . Indeed, the positive jump times of Lj. are necessarily negative jump 
times of Lj for some j < k. Instead, this means that the arrival process 
for the queue Lk is a counting process such that 

(38) L+{t)- fY.jL,{s)P{x = k-j + l)ds 

Jo ,=1 

is a martingale and conditionally on L^, the process Lfc(t) is independent 
of the lower-level queues Lj, j = 1, . . . ,k — I. The departure rate at time t 
is k(l + 9e~^) because for each family of size k, there are k individuals that 
could reproduce or experience mutation. 

In particular, the arrival rate (37) for Lk{t) is almost surely asymptotic 

to 

ew(^J2jajP{x = k-j + i)y''. 

Applying Lemma 32 with X = k, we conclude 

e'^^Lkit) ^eWak a.s., 

where 

k-l 

= 'k+'c ^ •^^^■■^(^ = k-j + l). 

Thus, the constants ak satisfy the same recursion established in (33) for 
P{ir = k). Because oi = (l/c)P(^^ = A;), it follows that ak = (l/c)P(V = A;) 
for all fc. □ 



We now use this result to obtain the asymptotic behavior of the quantities 
MfW and iVGW_ 
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Lemma 34. For all k>l, almost surely 



and 



c 



Proof. Note that every mutation before time t that is counted by Lk(t) 
is inherited by k individuals at time t. By the definition of Lk{t), these 
k individuals experience no additional mutations, so they form a block of 
the allelic partition at time t. It follows that Li.{t) < [t) and Lj^{t) < 

Njf'^{t). Furthermore, if any mutation not counted by Lk{t) is passed on 
to k individuals at time t or gives rise to a block of size k in the allelic 
partition at time t, then some descendant of the individual that experiences 
the mutation must experience another mutation before time t. Therefore, we 
have M^"^{t) < Lk{t) + K{t) and iV^^(t) < Lk{t) + K{t). Thus, the result 
will follow from Lemma 33 once we prove that 

(39) lim e-^*K(i) = a.s. 

t— too 

To prove (39), note that if M(t) denotes the total number of mutations 
before time t, then for all positive integers A^, 

oo 

Kit) = M{t)-Y,Lk{t) 

k=l 

N 

<M{t)-Y.Lk{t). 

k=l 

Conditional on {(,t,t > 0), the process {M{t),t > 0) is a queueing system 
with departure rate zero and arrival rate 9e~^£,f Therefore, by Lemma 32, 
we have e~'^^M{t) — > 9W/c a.s. By combining this result with Lemma 33, we 
get 

ew A OW 



limsupe""*K(t) < ^ P{ir = k) 

t^oo C C 

ew 



c 



Letting N ^ oo gives (39). □ 



Remark 35. Another consequence of this result is that the proportions 
of families of size k both in the infinite sites and the infinite alleles models 
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satisfy 

M{t) 



M{t) 

almost surely. We will use this below. 



6.4. Almost sure result for the Beta-coalescent tree. Let u> Q and con- 
sider the reduced tree T" at level u, which, we recall, has, at level < i < 1, 
as many vertices as there are excursions between tu and u. Suppose muta- 
tion marks fall at intensity 6 dt per unit length on this tree and for fc > 1 , 
we let M^" (t) be the number of families of size k at level < t < 1 in the 
infinite sites model, and let N'^ (t) be the equivalent quantity in the infinite 
alleles model. 

Lemma 36. For fixed u, conditionally on supo<s<i;. Hg > u, almost surely 
as t ^ 0, 

(40) f^Mni - t) - = k) 
and 

(41) t^Nni - t) - — n-V("-i)z„P(e. = k), 



c 



where K = {a - 



Proof. The proof follows from Lemma 34 in exactly the same way that 
Lemmas 23 and 25 follow from the Kesten-Stigum theorem, the idea being 
simply that we can again identify W with Ku'^/'''^-^^ Zu when we look at 
the reduced tree at level u, T". □ 

Proof of Theorem 9. We first note that (40) may be strengthened 
into the same result where the convergence holds almost surely for all 6 
simultaneously. Indeed, if we assume that mutation marks come with a label 
6 in (0, oo) and that mutation marks fall on the tree with intensity d9 ^ dt, 
where dt stands for the unit length of the tree, we obtain a construction 
of M"^^ [t) for all 6 simultaneously by considering those marks whose label 
is smaller than 9. (We note for later purposes that, independent of the 
shape of the tree, such mutation marks may themselves be obtained from a 
probability measure Q which is a countable collection of independent Poisson 
processes with intensity d9(^dt.) Observe that since (t) is monotone in 
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9 and since (40) holds for all rational 9, it also holds for nonrational values of 
9. To get (40) simultaneously for all in the infinite allele case as well, note 
that \M^^{t) - N^^{t)\ < K{t) for all k and t. Since K{t) is monotone 
in 0, the result (39) holds for all ^ > 0, so (41) also holds simultaneously for 
all 9. 

Let Au be the event that (40) and (41) hold almost surely for all 9 si- 
multaneously. By applying Lemma 26 with the product probability P x Q, 
we may assume, without loss of generality, that (40) and (41) hold almost 
surely for all 9 also at level u = i?~^(l), that is, 

PxQ(^«-i(i)) = l. 

Let To = be the reduced tree at level R~^{1). In order to translate 

the result to the Beta-coalescent tree, one more fact is needed, since the 
coalescent tree is not exactly Tq, but a time-change of Tq. (Indeed, for t < 1, 
the coalescent tree T has t #excj:j-i(i_() /j-i(i) branches at level, rather 
than #excj^-i(i) j:j-i(i).) In fact, this simply translates into a change of the 
intensity of the mutation marks for Tq. Indeed, for a given segment in the 
coalescent tree, between level R~^{1 — t) and R~^{1 — s) for s <t, there 
is a Poisson number of marks with intensity 9{t — s). So, if < <t < r < 1, 
the number of marks on a segment of the reduced tree Tq between levels 
a and r is also a Poisson random variable with parameter 9{t — s), with 
R-^{l-t) = tR-^{1) and (1 - s) = aR-^{l). Now, observe that as t ^ 
or r ^ 1, this means that the intensity of the marks becomes asymptotic 
to 9R~^{l)/q, where q is the derivative of the function R~^{1 — t) at t = 0, 
which was shown to be 



^ a(a-l)r(a) ^^'W 

in Lemma 27. Let M^(t) be the number of families of size k obtained from 
the coalescent tree considered for all s>t. (I.e., this tree at level s > has 
lllt+sl branches.) Using monotonicity of M'^°(t) (number of families of size k 
in the infinite-site case on Tq) with respect to the intensity, this means that 
for ah e > 0, for t sufficiently small, M^{t) < M^^'il - tq/R-^{l)), where 
the intensity is {9 + e)R~^{l)/q. Using (40) and the notation u = R~^{l), 
we have 

hmsup fM^{t) < limsupt'=Mj»(l - tq/R-^{l)) 

< ^'(^ + ")" n-V(^-i)z,nV^^(e. = k) 

qc 

< ^(ar(a))i/("-i)p(er = A;) 
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after simplification, recalling that c = (2 — a) /(a — 1). We may proceed sim- 
ilarly with the liminf, so we have proven that almost surely as t — > 0, 



t'M^^it) ^ ^(ar(a))i/("-^)p(^, = k). 

Combining this result with (39), we get 

fN^(t)^-^iaTia))'/(''-'^PiCr = k). 

The same calculations apply to show that the total number of marks M^{t) 
satisfies 

«n(t)^-(ar(a))^/("-i) 
c 

almost surely. We apply this convergence at times t = Tn = inf{t > : lllj | < 
n}. Recall that T„ ~ {aT{a))'n}~°' almost surely and that when |n(r„)| = n 
(i.e., if the coalescent ever has n blocks), then M^{Tn) is identical to M{n). 
On the other hand, by Theorem 1.8 in [8], lim^^oo P{\^Tn | = ra) = a — 1 > 0, 
so conditioning on this event which has asymptotically positive probability, 
we find that 

^ ^ a(«-l)r(Q) 

n M (n) —fp 



2-a 

(this argument is similar to the one for Theorem 1.9 in [8]). On the other 
hand, the total number of families M{n) is Poisson with parameter OLn, 
conditionally on L„, where is the total length of the tree, so this gives 
another proof of Theorem 1.9 in [8] which states that 

2 J a{a-l)T{a) 
2-a ■ 

We conclude similarly that 

n 9 , , / N ^Oiia — l)r(a) , , , 
n°-2Mfc(n) ^ ' ' P{ir = k). 
I — a 

It follows immediately that the same convergence holds for Nk{n) and this 
concludes the proof of Theorem 9. □ 

Corollary 37. Let K[n) he the size of a family chosen uniformly at 
random among all M{n) families when the population has n individuals. 
Then 

K{n) ^dir. 

This is just a reformulation of the fact that the proportions of families of 
size k converge to -P(Cr = k). Note that < co almost surely, meaning that, 
asymptotically, a typical family stays of finite size. 
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